Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
8 years ago

Kernel performance with profiling

Hello again guys.

Im struggle to understand the results from the profiler of two kernel versions (one with unroll factor of 128 and another with 32)

The 32 unroll factor outperforms the 128 factor by 5 seconds for an input matrix of 20000 x 1000.

Stats are:

32 | 128

Activity: 96% | 25%

Memory(global) BW: 15182 MB/s | 11885 MB/s

Kernel Clock Freq: 244 MHz | 185 MHZ

Stall %: 14,49% | 15,1 %

I don't know what is happening because the stall increases while the best version has better memory bw..

15 Replies