About Hopper
Our simulator intends to support new features in hopper architecture.
Reference:
A strange problem
There is a 4-cycle difference between our simulator and real hardware, but our config files are coorect theoretically.
# Instruction latencies and initiation intervals
-ptx_opcode_latency_int 4(1),4,4,4,21,14
# use a special 1 for int add for further analysis
Theoretically, the latency for int add instruction should be 4 cycles, here is the experiment result:
INT ADD:
8 ops: 26.0 cycles (3.25 cycles/op)
16 ops: 45.0 cycles (2.81 cycles/op)
32 ops: 123.0 cycles (3.84 cycles/op)
64 ops: 239.9 cycles (3.75 cycles/op)
Slope (cycles/op): 3.91
Intercept: -8.95
R²: 0.9952
But in simulator, there's always a 5-cycle overhead:
# Configured 1-cycle in gpgpusim.config
INT ADD:
8 ops: 47.0 cycles (5.88 cycles/op)
16 ops: 95.0 cycles (5.94 cycles/op)
32 ops: 191.0 cycles (5.97 cycles/op)
64 ops: 383.0 cycles (5.98 cycles/op)
Slope (cycles/op): 6.00
Intercept: -1.00
R²: 1.0000
# Configured 4-cycle in gpgpusim.config
INT ADD:
8 ops: 68.0 cycles (8.50 cycles/op)
16 ops: 140.0 cycles (8.75 cycles/op)
32 ops: 284.0 cycles (8.88 cycles/op)
64 ops: 572.0 cycles (8.94 cycles/op)
Slope (cycles/op): 9.00
Intercept: -4.00
R²: 1.0000
According to the result, latency of each instruction in simulator is exactly the same, but there's a 5-cycle overhead compared to configuration.
The problem lies in the original instruction pipeline of GPGPUSim. The wrong issue steps lead to a fixed overhead.
Tuner
https://github.com/accel-sim/accel-sim-framework