Non-Parametric Dual-Manifold Mapping via 8-Bit Bounded Transformation Matrices: Challenging FP-centric Hardware Paradigms in Low-Energy AI

Lars Kopp 2026-06-14

The paper addresses the problem of high energy costs from floating-point arithmetic in deep learning hardware. It proposes a non-parametric, training-free framework using 8-bit signed integer transformation matrices and bitwise logic for dual-manifold mapping. Experimental evidence shows near-perfect reconstruction under 90% truncation sparsity and 20% random node destruction, demonstrating extreme holographic resilience. This matters because it challenges the necessity of dense, floating-point-centric GPU accelerators, enabling a shift toward low-energy neuromorphic edge-computing.

PDF

Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training

Yuming Zhou, Haoyang Li, Sheng Lin, Yanfeng Zhao 2026-06-14

ForeMoE addresses expert load imbalance in Mixture-of-Experts (MoE) models during reinforcement learning (RL) post-training, where existing step-level statistics fail due to high-frequency micro-step fluctuations. The method exploits foreseeable routing information from the rollout stage to proactively guide load balancing, using a hierarchical planner to decompose the NP-hard problem and a transfer engine for overlapped expert transfer. Evaluations on 64 GPUs show up to a 1.45× speedup over state-of-the-art RL post-training systems. This matters because it enables efficient scaling of MoE LLMs under the unique workload dynamics of RL post-training, a dominant paradigm in current LLM development.

PDF

nomp: A Framework for Building Domain Specific Compilers

Thilina Ratnayaka, Kaushik Kulkarni, Nipuna Fernando, Pubudu Hewavitharana 2026-06-14

Problem: Existing GPU programming models force a trade-off between low-level performance and high-level productivity, with no single solution achieving all three goals of productivity, portability, and performance. Method: The authors propose nomp, a framework for building domain-specific compilers that uses a pragma-based programming model and a runtime for code transformation and generation based on user-provided metadata. Finding or experimental evidence: The abstract does not disclose experimental results. Why it matters: nomp aims to improve programmer productivity without sacrificing performance or portability by enabling reuse of domain-specific optimization patterns.

PDF