Daily | Yixun Hong

Filtered by: Performance × Search × Compiler × Clear all

Arbor: Tree Search as a Cognition Layer for Autonomous Agents

Neha Prakriya, Chaojun Hou, Zheng Gong, Huasha Zhao 2026-06-14

Search × Cognition Autonomy Agents

Arbor addresses the problem of autonomous optimization in large, stateful action spaces by introducing a multi-agent framework with structured tree search as a shared cognition layer. The method pairs an Orchestrator agent with a Critic agent in a checks-and-balances architecture, using an explicit search tree of scored hypotheses as working memory. Experimental evidence shows Arbor achieves up to 193% inference throughput-latency Pareto improvement over vendor-optimized baselines, while a single agent without the harness plateaus at +33% and crashes within hours. This matters because it enables fully autonomous, hardware-agnostic, and reproducible multi-day optimization campaigns across the full LLM inference stack.

PDF

Eidola: Modeling Multi-GPU Network Communication Traffic in Distributed AI Workloads

Ranganath R. Selagamsetty, Matthew Poremba, Bradford M. Beckmann, Joshua San Miguel 2026-06-14

Gem5 Interconnect Microarchitecture Simulation HPC Compiler × Runtime GPU

Eidola addresses the problem of modeling irregular and transient inter-GPU communication traffic in distributed AI workloads, which existing tools fail to capture due to fine-grained synchronization and peer-to-peer writes. The method introduces a scalable gem5 extension that uses annotated timing profiles from real applications to emulate peer-to-peer GPU writes with cycle-level precision. Experimental evidence demonstrates Eidola's effectiveness by reproducing variability in fused kernel execution and confirming reductions in polling-related memory traffic via a SyncMon-inspired mechanism. This matters because Eidola provides a flexible platform for architectural exploration of interconnect bandwidth and latency in modern multi-GPU systems.

PDF