Filtered by: Model × Data × Clear all

ITME: Inference Tiered Memory Expansion with Disaggregated CXL-Hybrid Memories

Hakbeom Jang, Younghoon Min, Sunwoong Kim, Taeyoung Ahn 2026-06-14

ITME addresses the problem of scaling shared context infrastructure for TB-scale LLM inference workloads beyond individual server capacity. The method leverages CXL-hybrid memory to provide massive, byte-addressable remote memory expansion, simplifying the software stack by eliminating complex software-level optimization. Experimental evidence from production-grade SK Hynix CMM and PCIe Gen5 NVMe SSDs, along with an FPGA prototype, shows up to a 35.7% throughput improvement over conventional CPU-offloading. This matters because ITME enables cost-efficient scaling of shared context layers for agentic and long-context LLMs by proactively managing data movement across the memory-storage hierarchy.

PDF

Structured Testbench Generation for LLM-Driven HDL Design and Verification-Oriented Data Curation

En-Ming Huang, Yu-Hung Kao, Ren-Hao Deng, Wei-Po Hsin 2026-06-14

Problem: Automated testbench generation is a bottleneck in LLM-driven RTL workflows due to stochastic, costly, and low-coverage outputs from prompt-based methods. Method: STG (Structured Testbench Generation) exploits hardware design structure to produce deterministic testbenches. Finding: STG runs 720x faster than iterative LLM-based flows, achieves higher coverage, reduces false-pass verdicts, and is 11x faster and 127x more energy-efficient than LLM-based filtering on a single CPU core. Why it matters: STG enables rapid, reliable verification for LLM-driven design, improves RTL benchmarks by exposing faulty testbenches, and yields state-of-the-art distilled models with reduced node count.

PDF

nomp: A Framework for Building Domain Specific Compilers

Thilina Ratnayaka, Kaushik Kulkarni, Nipuna Fernando, Pubudu Hewavitharana 2026-06-14

Problem: Existing GPU programming models force a trade-off between low-level performance and high-level productivity, with no single solution achieving all three goals of productivity, portability, and performance. Method: The authors propose nomp, a framework for building domain-specific compilers that uses a pragma-based programming model and a runtime for code transformation and generation based on user-provided metadata. Finding or experimental evidence: The abstract does not disclose experimental results. Why it matters: nomp aims to improve programmer productivity without sacrificing performance or portability by enabling reuse of domain-specific optimization patterns.

PDF