HMARL-CBF

Hierarchical Multi-Agent Reinforcement Learning with Control Barrier Functions for Safety-Critical Autonomous Systems

¹Boston University ²Massachusetts Institute of Technology

Abstract

We address the problem of safe policy learning in multi-agent safety-critical autonomous systems, where each agent must satisfy safety requirements at all times while cooperating to accomplish the task. We propose a safe Hierarchical Multi-Agent Reinforcement Learning (HMARL) approach based on Control Barrier Functions (CBFs). Our hierarchical method decomposes the problem into two levels: a high-level policy that learns joint cooperative behavior over skills/options, and a low-level policy that executes those skills safely with CBFs. We validate our approach on challenging scenarios in which many agents must safely navigate conflicting road networks. Compared with state-of-the-art methods, HMARL-CBF achieves near-perfect (≥ 95%) success/safety rates while also improving performance across environments.

Problem Definition

We consider the multi-agent constrained optimal control problem with discrete-time, unknown dynamics, partial observability, input constraints, and without a known performant nominal policy. Given \(N\) agents, we aim to design distributed policies \(\pi_1, \dots, \pi_N\), such that:

The task is done:

\[ \min_{\pi_1,\dots,\pi_N} \sum_{k=0}^{\infty} l(\mathbf{x}^k, \boldsymbol{\pi}(\mathbf{x}^k)), \]

following the unknown dynamics:

\[ \mathbf{x}^{k+1} = f(\mathbf{x}^k, \boldsymbol{\pi}(\mathbf{x}^k)), \]

and the agents are safe:

\[ h_i^{(m)}(o_i^k) \le 0, \quad o_i^k = O_i(\mathbf{x}^k). \]

HMARL-CBF Overview

The high-level policy selects skills for agents (CTDE), while the low-level CBF-based controller executes skills safely via a per-step QP with affine CBF constraints (and optional CLF terms). This combines sample efficiency, decentralized execution, and pointwise safety guarantees.

BibTeX

@article{ahmad2025hmarlcbf, title={Hierarchical Multi-Agent Reinforcement Learning with Control Barrier Functions for Safety-Critical Autonomous Systems}, author={Ahmad, H. M. Sabbir and Sabouni, Ehsan and Wasilkoff, Alexander and Budhraja, Param and Guo, Zijian and Zhang, Songyuan and Fan, Chuchu and Cassandras, Christos and Li, Wenchao}, journal={arXiv preprint arXiv:2507.14850}, year={2025}, url={https://arxiv.org/abs/2507.14850v2} }

Hierarchical Multi-Agent Reinforcement Learning with Control Barrier Functions for Safety-Critical Autonomous Systems

HMARL-CBF: Hierarchical skills with CBF-backed safety.

Abstract

Problem Definition

HMARL-CBF Overview

Results

BibTeX