Hierarchical Multi-Agent Reinforcement Learning with Control Barrier Functions for Safety-Critical Autonomous Systems

H. M. Sabbir Ahmad1, Ehsan Sabouni1, Alexander Wasilkoff1, Param Budhraja1, Zijian Guo1, Songyuan Zhang2, Chuchu Fan2, Christos Cassandras1, Wenchao Li1
1Boston University    2Massachusetts Institute of Technology

HMARL-CBF: Hierarchical skills with CBF-backed safety.

HMARL-CBF MetaDrive Environments

Abstract

We address the problem of safe policy learning in multi-agent safety-critical autonomous systems, where each agent must satisfy safety requirements at all times while cooperating to accomplish the task. We propose a safe Hierarchical Multi-Agent Reinforcement Learning (HMARL) approach based on Control Barrier Functions (CBFs). Our hierarchical method decomposes the problem into two levels: a high-level policy that learns joint cooperative behavior over skills/options, and a low-level policy that executes those skills safely with CBFs. We validate our approach on challenging scenarios in which many agents must safely navigate conflicting road networks. Compared with state-of-the-art methods, HMARL-CBF achieves near-perfect (≥ 95%) success/safety rates while also improving performance across environments.

Problem Definition

Problem Definition

We consider the multi-agent constrained optimal control problem with discrete-time, unknown dynamics, partial observability, input constraints, and without a known performant nominal policy. Given \(N\) agents, we aim to design distributed policies \(\pi_1, \dots, \pi_N\), such that:

The task is done:

\[ \min_{\pi_1,\dots,\pi_N} \sum_{k=0}^{\infty} l(\mathbf{x}^k, \boldsymbol{\pi}(\mathbf{x}^k)), \]

following the unknown dynamics:

\[ \mathbf{x}^{k+1} = f(\mathbf{x}^k, \boldsymbol{\pi}(\mathbf{x}^k)), \]

and the agents are safe:

\[ h_i^{(m)}(o_i^k) \le 0, \quad o_i^k = O_i(\mathbf{x}^k). \]

HMARL-CBF Overview

HMARL-CBF block diagram

The high-level policy selects skills for agents (CTDE), while the low-level CBF-based controller executes skills safely via a per-step QP with affine CBF constraints (and optional CLF terms). This combines sample efficiency, decentralized execution, and pointwise safety guarantees.

Results

Success Rate Results

Success Rate: HMARL-CBF achieves high task completion and safety consistency across environments.

Episode Length Results

Episode Length: Efficient convergence with reduced completion times compared to baselines.

BibTeX

@article{ahmad2025hmarlcbf,
  title={Hierarchical Multi-Agent Reinforcement Learning with Control Barrier Functions for Safety-Critical Autonomous Systems},
  author={Ahmad, H. M. Sabbir and Sabouni, Ehsan and Wasilkoff, Alexander and Budhraja, Param and Guo, Zijian and Zhang, Songyuan and Fan, Chuchu and Cassandras, Christos and Li, Wenchao},
  journal={arXiv preprint arXiv:2507.14850},
  year={2025},
  url={https://arxiv.org/abs/2507.14850v2}
}