Daily | Yixun Hong

Filtered by: Serving × AI × Runtime × Clear all

Characterizing Software Aging in GPU-Based LLM Serving Systems

Domenico Cotroneo, Bojan Cukic 2026-06-14

GPU LLM Serving ×

The paper addresses the problem of software aging in GPU-based LLM serving systems, which differ from traditional CPU-centric systems due to heterogeneous hardware and highly variable workloads. The method involves a 216-hour empirical campaign across six co-located deployments with identical stress, monitoring host, device, and client metrics and applying a statistical pipeline for autocorrelation and multiple testing. Experimental evidence shows statistically significant memory aging in all deployments, with leak rates strongly dependent on the serving runtime and configuration. This matters because it provides a reproducible framework bridging software aging and rejuvenation research with LLM serving, enabling future mitigation strategies.

PDF

ReSCom: A Reconfigurable Spiking Neural Network Accelerator Using Stochastic Computing

Ali Alipour Fereidani, Mohammad Rasoul Roshanshah, Saeed Safari 2026-06-14

Neural Network Accelerator

ReSCom addresses the high power and area costs of Spiking Neural Network (SNN) hardware by introducing a reconfigurable accelerator that uses stochastic computing for multiplication while preserving exact fixed-point addition and subtraction. The method employs a unified neuron design supporting IF, LIF, and Synaptic models, enabling runtime trade-offs between accuracy, latency, and energy. On MNIST inference with a Xilinx Artix-7 FPGA, ReSCom achieves 92.80% accuracy at 0.05 mJ per image and 100 MHz, outperforming recent state-of-the-art implementations in energy efficiency. This matters because it demonstrates that stochastic computing can stabilize SNN inference while providing explicit, dynamic control over accuracy-latency-energy trade-offs for resource-constrained edge applications.

PDF