CoRAL | Reinforcement learning for multi-agent systems

RL is a data-driven approach to determining optimal policies in the presence of unknown stochastic dynamics. RL has recently seen a resurgence in optimal control and decision making for dynamical systems based on adaptive dynamic programming, Q-learning, and actor-critic methods. However, when applied to a MAS, RL faces challenges on the curse of dimensionality and learning efficiency. Targeting these challenges, we have investigated strategies that learn coordination policies effectively and efficiently by exploiting structures in a MAS, including

a) time-scale separation in clustered networks, such as power networks,

b) a hierarchical structure driven by global and local reward functions, such as the multi-drone multi-target tracking application below (click the picture to redirect to the youtube video).

Currently, we are investigating a graph-based multi-agent reinforcement learning (MARL) problem that specify topological connections between the agents. Specifically, a state graph, an observation graph, and a reward graph characterize the coupling between the agent dynamics, the constraints in the agents’ observations, and the dependency of the agents’ rewards on others, respectively. We exploit the graph structures to decompose the learning process without approximation and find that the variance in the policy gradient estimates can be greatly reduced, leading to faster convergence and better sample complexity and scalability. The figure below shows the comparison of our algorithms “MAStAC” (multi-agent structured actor-critic) compared with other baseline algorithms for a 40-zone temperature control problem.

Relevant Publications

2024

Distributed Multi-Agent Reinforcement Learning Based on Graph-Induced Local Value Functions

Jing, Gangshan, Bai, He, George, Jemin, Chakrabortty, Aranya, and Sharma, Piyush K

IEEE Transactions on Automatic Control 2024
Asynchronous distributed reinforcement learning for lqr control via zeroth-order block coordinate descent

Jing, Gangshan, Bai, He, George, Jemin, Chakrabortty, Aranya, and Sharma, Piyush K

IEEE Transactions on Automatic Control 2024

2021

Model-Free Optimal Control of Linear Multiagent Systems via Decomposition and Hierarchical Approximation

Jing, Gangshan, Bai, He, George, Jemin, and Chakrabortty, Aranya

IEEE Transactions on Control of Network Systems 2021
Scalable designs for reinforcement learning-based wide-area damping control

Mukherjee, Sayak, Chakrabortty, Aranya, Bai, He, Darvishi, Atena, and Fardanesh, Bruce

IEEE Transactions on Smart Grid 2021
Learning distributed stabilizing controllers for multi-agent systems

Jing, Gangshan, Bai, He, George, Jemin, Chakrabortty, Aranya, and Sharma, Piyush K

IEEE Control Systems Letters 2021