Ph.D. Dissertation · UT Austin

Communication and Generalization in Multi-Agent Learning

Jiaxun Cui · Electrical and Computer Engineering
Advisor: Peter Stone
The University of Texas at Austin, 2025

Abstract

Multi-agent learning aims to allow artificial intelligence (AI) agents to learn from interactions with other agents in an environment. However, as AI increasingly integrates into real-world systems, significant challenges arise in how to robustly interact with and communicate with a variety of other agents, particularly in complex environments such as autonomous driving, where humans and AI agents coexist. This dissertation research investigates how agents can be trained to effectively communicate with and generalize to diverse partners (including humans) in simulated real-world scenarios.

Towards addressing this challenge, this dissertation explores three key dimensions: (1) learning communication-supporting representations that facilitate coordination, (2) developing multi-agent policies that generalize to new teammates or opponents, and (3) learning to collaborate with human-like agents or to use human language. This dissertation makes novel contributions along each dimension.

First, the dissertation presents Coopernaut, a framework that learns compact, transmittable representations from local observations to support communication among autonomous vehicles under bandwidth constraints. It also introduces LLM+Debrief, which enables embodied agents to coordinate in driving scenarios by generating and interpreting natural language messages, paving the way for human-compatible agent communication.

Second, it introduces MACTA, a reinforcement learning and game-theoretic training framework that produces robust policies capable of generalizing to unseen and adaptive opponents. In addition, L-BRDiv is introduced as a teammate generation strategy that promotes behavioral diversity during training, improving generalization and performance in ad hoc teamwork settings.

Third, the dissertation investigates mixed-autonomy traffic coordination through decentralized training in environments with both human-proxy and AI agents. Empirical results demonstrate that even a small number of trained autonomous vehicles can collaborate effectively to influence human behavior and improve overall traffic efficiency without requiring centralized control.

Collectively, these contributions advance multi-agent AI by unifying communication, generalization, and human–AI collaboration. Evaluated in both toy domains and realistic simulated environments, primarily focusing on autonomous driving and hardware security, the work demonstrates how agents can adapt to novel partners and communicate effectively in human-interpretable ways.

Structure

Part I

Background

Motivation, contributions, and the formal foundations used throughout the dissertation — MDPs, partially observable stochastic games, agent populations, and learning objectives.

  • Ch. 1 Introduction
  • Ch. 2 Background and Notation
Part II

Learning to Communicate

From shared latent representations across networked vehicles to explicit natural-language messages between agents driven by large language models.

  • Ch. 3 Learning to Communicate in Latent Representations · Coopernaut, CVPR 2022 paper project code
  • Ch. 4 Learning to Communicate in Natural Language · CoopReflect, AAMAS 2026 paper project
Part III

Learning to Generalize

Training pipelines and theoretical tools that produce agents which remain robust when deployed with unseen co-players — both adversarial opponents and cooperative teammates.

  • Ch. 5 Generalizing to Adversarial Opponents · MACTA, ICLR 2023 paper code
  • Ch. 6 Generalizing to Cooperative Teammates · L-BRDiv / Minimum Coverage Sets, AAAI 2024 paper
Part IV

Learning with Human Proxies

Centralized, modular-transfer, and distributed multi-agent driving policies that coordinate with rule-based human-proxy traffic to dissolve stop-and-go congestion in mixed-autonomy highways.

  • Ch. 7 Collaborating with Human Proxies · Scalable Multiagent Driving, AAMAS 2021 (Oral) paper code
Part V

Related and Future Work

Connections to the broader literature, open problems, and a roadmap toward super-human Pokémon AI, open-ended ad hoc teamwork, multi-agent strategic reasoning for LLMs, and multi-agent collaboration safety.

  • Ch. 8 Related Work
  • Ch. 9 Future Work
  • Ch. 10 Conclusion