← Back to Articles

FlashGPU-sim

2026-03-02 Yixun Hong 311 words · 2 min read
GPGPUHoppersimulator

About Hopper

Our simulator intends to support new features in hopper architecture.

Reference:

A strange problem

There is a 4-cycle difference between our simulator and real hardware, but our config files are coorect theoretically.

# Instruction latencies and initiation intervals
-ptx_opcode_latency_int 4(1),4,4,4,21,14
# use a special 1 for int add for further analysis

Theoretically, the latency for int add instruction should be 4 cycles, here is the experiment result:

INT ADD:
  8 ops: 26.0 cycles (3.25 cycles/op)
  16 ops: 45.0 cycles (2.81 cycles/op)
  32 ops: 123.0 cycles (3.84 cycles/op)
  64 ops: 239.9 cycles (3.75 cycles/op)
  Slope (cycles/op): 3.91
  Intercept: -8.95
  R²: 0.9952

But in simulator, there's always a 5-cycle overhead:

# Configured 1-cycle in gpgpusim.config
INT ADD:
  8 ops: 47.0 cycles (5.88 cycles/op)
  16 ops: 95.0 cycles (5.94 cycles/op)
  32 ops: 191.0 cycles (5.97 cycles/op)
  64 ops: 383.0 cycles (5.98 cycles/op)
  Slope (cycles/op): 6.00
  Intercept: -1.00
  R²: 1.0000
# Configured 4-cycle in gpgpusim.config
INT ADD:
  8 ops: 68.0 cycles (8.50 cycles/op)
  16 ops: 140.0 cycles (8.75 cycles/op)
  32 ops: 284.0 cycles (8.88 cycles/op)
  64 ops: 572.0 cycles (8.94 cycles/op)
  Slope (cycles/op): 9.00
  Intercept: -4.00
  R²: 1.0000

According to the result, latency of each instruction in simulator is exactly the same, but there's a 5-cycle overhead compared to configuration.

The problem lies in the original instruction pipeline of GPGPUSim. The wrong issue steps lead to a fixed overhead.

Tuner

https://github.com/accel-sim/accel-sim-framework