Learning Latency-Aware Orchestration for Parallel Multi-Agent Systems

Xi Shi, Mengxin Zheng, Qian Lou

1/15/2026

cs.MAcs.AIcs.CL

Abstract

Multi-agent systems (MAS) enable complex reasoning by coordinating multiple agents, but often incur high inference latency due to multi-step execution and repeated model invocations, severely limiting their scalability and usability in time-sensitive scenarios. Most existing approaches primarily optimize task performance and inference cost, and explicitly or implicitly assume sequential execution, making them less optimal for controlling latency under parallel execution. In this work, we investigate learning-based orchestration of multi-agent systems with explicit latency supervision under parallel execution. We propose Latency-Aware Multi-agent System (LAMaS), a latency-aware multi-agent orchestration framework that enables parallel execution and explicitly optimizes the critical execution path, allowing the controller to construct execution topology graphs with lower latency under parallel execution. Our experiments show that our approach reduces critical path length by 38-46% compared to the state-of-the-art baseline for multi-agent architecture search across multiple benchmarks, while maintaining or even improving task performance. These results highlight the importance of explicitly optimizing latency under parallel execution when designing efficient multi-agent systems. The code is available at https://github.com/xishi404/LAMaS

View on arXiv View PDF

Code Implementations(10)

xishi404/LAMaSOfficial100%

code implementation for "Learning Latency-Aware Orchestration for Parallel Multi-Agent Systems"

00PythonJan 15, 20263 months agoMIT

Ibrahim-3d/orchestrator-supaconductor67%

Multi-agent orchestration system for Claude Code with parallel execution, automated quality gates, Board of Directors, and bundled Superpowers skills

31733Feb 17, 20261 weeks agoAGPL-3.0

ai-agentsclaudeclaude-codeclaude-code-pluginevaluate-loop+4 more

cubetribe/ClaudeCode_GodMode-On66%

Self-orchestrating multi-agent system for Claude Code. 8 AI agents, parallel quality gates, skills architecture & plugin support. Optimized for Claude 4.6.

193Dec 7, 20254 weeks agoNOASSERTION

ai-agentsautomationclaude-4-6claude-codecode-quality+6 more

yohey-w/multi-agent-shogun65%

Samurai-inspired multi-agent system for Claude Code. Orchestrate parallel AI tasks via tmux with shogun → karo → ashigaru hierarchy.

1,048240Jan 25, 20261 months agoMIT

ai-agentanthropicautomationclaude-codellm+5 more

commitbyrajat/knowledge_aware_agent63%

A high-performance agentic RAG system combining Graphiti's temporal knowledge graphs with LangGraph's multi-agent orchestration to achieve 100x faster retrieval speeds than traditional RAG through intelligent graph-based indexing and parallel agent processing.

81Jun 29, 20259 months ago

YASSERRMD/AiMesh61%

High-performance Rust message queue for AI agents. 5M+ msgs/sec, <1ms latency, cost-aware routing, semantic dedup, scatter-gather orchestration.

20Dec 31, 20252 months agoMIT

sahinulhaque/agent-support-pilot58%

Stateful Multi-Agent Orchestration with LangGraph. Features: Intent-aware routing, async SQL data retrieval, optimized RAG ingestion, and persistent session checkpointing. Architected for 40% lower latency and enterprise scalability.

00Feb 9, 20261 months ago

FibrinLab/pulse56%

Latency-Aware Orchestrator for Autonomous Agents

00Jan 13, 20263 months ago

MagdyNasr41/implementation-of-parallel-copmuting-concepts52%

A collection of Python implementations exploring core concepts in parallel computing and computer architecture, including cache hierarchy simulation, memory latency analysis, and other experiments that illustrates how performance scales across multi-level systems.

00Nov 3, 20255 months ago

NingBellWind/AppleS_Artifact52%

AppleS aims to improve database scalability by delivering the right amount and pattern of user parallel I/O requests to the database system under excessive user parallelism, aligning with the concurrency supported by the database and its underlying I/O stack. In doing so, AppleS improves user-level I/O performance in terms of user-level I/O fairness, throughput and latency stability. Implemented as a user-space module based on system call interception, AppleS is compatible with and portable to different types/versions of databases, different versions of OS kernels and their resource management tools, e.g., Cgroups.

11Feb 1, 20224 years ago

Learning Latency-Aware Orchestration for Parallel Multi-Agent Systems

Abstract

Code Implementations(10)

Discussion