Daily | Yixun Hong

A Photonic-CXL Memory Appliance for Scalable KV Cache Management in LLM Inference

Jing Ding, Yash Nishant, Chandrish Ambati, Jyothsna Kamati 2026-07-31

Cache LLM Inference GPU Architecture Simulation × Workload ×

The paper addresses the memory wall in LLM inference, where KV cache demands tens of terabytes at hundreds of GB/s exceed current memory tier capabilities. The proposed Marvell Photonic Fabric Memory Appliance replaces electrical switches with a passive fiber shuffle in a switch-free full-crossbar topology, delivering 32 TB shared memory across 16 hosts via photonic-CXL hybrid architecture. Emulation results show over 50% latency reduction versus electrical CXL pools, while simulation demonstrates a 6.6x improvement in time-to-first-token by eliminating cache eviction cliffs for multi-turn workloads. This work matters because it enables practical TB-scale shared memory for concurrent long-context users, overcoming the scalability limits of electrical CXL pooling in real deployments.

PDF