Filtered by: Ph × Compiler × Clear all

On the Limits of Performance Portability in Directive-Based GPU Programming

Alessandro Romeo, Nitin Shukla, Stefano Truzzi, Alessio Suriano 2026-06-14

The problem is that directive-based GPU programming faces fundamental trade-offs between performance, portability, and productivity when transitioning scientific applications to exascale systems. The method involved porting the production-grade magnetohydrodynamics code gPLUTO from OpenACC to OpenMP and evaluating its performance on NVIDIA A100 and AMD MI250X devices. Experimental evidence shows that while OpenACC and OpenMP achieve comparable performance on NVIDIA platforms, the same OpenMP implementation is approximately three times slower at the application level on AMD MI250X, with kernel-level slowdowns reaching up to 47x due to strided memory-access patterns, compiler limitations, and register pressure from C++ abstractions. This matters because it demonstrates that achieving portable performance across GPU architectures requires not only application-level changes but also continued advances in compiler backends and architecture-aware optimization strategies.

PDF

Specifying Hardware Communication as Programs

Ernest Ng, Nikil Shyamsunder, Francis Pham, Adrian Sampson 2026-06-14

The problem is that hardware testing requires separate driver and monitor programs for each protocol, leading to manual effort and inconsistency risks. The method proposes a DSL that specifies hardware communication protocols as succinct imperative programs, enabling a single specification to both drive and monitor transactions. The abstract does not disclose experimental results, but describes a tool that automatically infers transaction-level traces from waveforms using the DSL specification. This matters because it could eliminate redundant code and reduce bugs in hardware verification for protocols like Wishbone and AXI-Stream.

PDF