Filtered by: Cache × Clear all

Non-Parametric Dual-Manifold Mapping via 8-Bit Bounded Transformation Matrices: Challenging FP-centric Hardware Paradigms in Low-Energy AI

Lars Kopp 2026-06-14

The paper addresses the problem of high energy costs from floating-point arithmetic in deep learning hardware. It proposes a non-parametric, training-free framework using 8-bit signed integer transformation matrices and bitwise logic for dual-manifold mapping. Experimental evidence shows near-perfect reconstruction under 90% truncation sparsity and 20% random node destruction, demonstrating extreme holographic resilience. This matters because it challenges the necessity of dense, floating-point-centric GPU accelerators, enabling a shift toward low-energy neuromorphic edge-computing.

PDF

Partitioned Tags, Shared Data: Reconciling Strict Cache Isolation with Write-Shared Coherence

Kartik Ramkrishnan, Stephen McCamant, Antonia Zhai, Pen Chung Yew 2026-06-14

SCP solves the problem that write-shared coherence fails under strict cache partitioning, a decade-old barrier to deploying eviction-based side-channel defenses in secure shared-OS settings. The method partitions only the tags while sharing a single data pool, sizes the data pool to prevent capacity-driven cross-partition eviction, and routes writes to the LLC after a leakage threshold to mitigate coherence-based leakage. Experimental evidence from gem5 shows SCP mitigates Prime+Probe, Flush+Reload, and shared-writeable-line attacks to no better than random guessing, with a +2.8% LLC SRAM hardware cost and IPC within 0.3% of DAWG on SPEC CPU2017. This matters because SCP reconciles strict cache isolation with write-shared coherence, enabling secure partitioning without sacrificing performance or coherence correctness.

PDF

ITME: Inference Tiered Memory Expansion with Disaggregated CXL-Hybrid Memories

Hakbeom Jang, Younghoon Min, Sunwoong Kim, Taeyoung Ahn 2026-06-14

ITME addresses the problem of scaling shared context infrastructure for TB-scale LLM inference workloads beyond individual server capacity. The method leverages CXL-hybrid memory to provide massive, byte-addressable remote memory expansion, simplifying the software stack by eliminating complex software-level optimization. Experimental evidence from production-grade SK Hynix CMM and PCIe Gen5 NVMe SSDs, along with an FPGA prototype, shows up to a 35.7% throughput improvement over conventional CPU-offloading. This matters because ITME enables cost-efficient scaling of shared context layers for agentic and long-context LLMs by proactively managing data movement across the memory-storage hierarchy.

PDF