We consider the multi-agent constrained optimal control problem with
discrete-time, unknown dynamics, partial observability, input constraints,
and without a known performant nominal policy. Given \(N\) agents, we aim to design
distributed policies \(\pi_1, \dots, \pi_N\), such that:
The task is done:
\[
\min_{\pi_1,\dots,\pi_N} \sum_{k=0}^{\infty} l(\mathbf{x}^k, \boldsymbol{\pi}(\mathbf{x}^k)),
\]
following the unknown dynamics:
\[
\mathbf{x}^{k+1} = f(\mathbf{x}^k, \boldsymbol{\pi}(\mathbf{x}^k)),
\]
and the agents are safe:
\[
h_i^{(m)}(o_i^k) \le 0, \quad o_i^k = O_i(\mathbf{x}^k).
\]