Characterizing Software Aging in GPU-Based LLM Serving Systems

Domenico Cotroneo, Bojan Cukic 2026-06-14

The paper addresses the problem of software aging in GPU-based LLM serving systems, which differ from traditional CPU-centric systems due to heterogeneous hardware and highly variable workloads. The method involves a 216-hour empirical campaign across six co-located deployments with identical stress, monitoring host, device, and client metrics and applying a statistical pipeline for autocorrelation and multiple testing. Experimental evidence shows statistically significant memory aging in all deployments, with leak rates strongly dependent on the serving runtime and configuration. This matters because it provides a reproducible framework bridging software aging and rejuvenation research with LLM serving, enabling future mitigation strategies.

PDF

ReSCom: A Reconfigurable Spiking Neural Network Accelerator Using Stochastic Computing

Ali Alipour Fereidani, Mohammad Rasoul Roshanshah, Saeed Safari 2026-06-14

ReSCom addresses the high power and area costs of Spiking Neural Network (SNN) hardware by introducing a reconfigurable accelerator that uses stochastic computing for multiplication while preserving exact fixed-point addition and subtraction. The method employs a unified neuron design supporting IF, LIF, and Synaptic models, enabling runtime trade-offs between accuracy, latency, and energy. On MNIST inference with a Xilinx Artix-7 FPGA, ReSCom achieves 92.80% accuracy at 0.05 mJ per image and 100 MHz, outperforming recent state-of-the-art implementations in energy efficiency. This matters because it demonstrates that stochastic computing can stabilize SNN inference while providing explicit, dynamic control over accuracy-latency-energy trade-offs for resource-constrained edge applications.

PDF

Specifying Hardware Communication as Programs

Ernest Ng, Nikil Shyamsunder, Francis Pham, Adrian Sampson 2026-06-14

The problem is that hardware testing requires separate driver and monitor programs for each protocol, leading to manual effort and inconsistency risks. The method proposes a DSL that specifies hardware communication protocols as succinct imperative programs, enabling a single specification to both drive and monitor transactions. The abstract does not disclose experimental results, but describes a tool that automatically infers transaction-level traces from waveforms using the DSL specification. This matters because it could eliminate redundant code and reduce bugs in hardware verification for protocols like Wishbone and AXI-Stream.

PDF