Altera_Forum
Honored Contributor
9 years agoKernel performance with profiling
Hello again guys.
Im struggle to understand the results from the profiler of two kernel versions (one with unroll factor of 128 and another with 32) The 32 unroll factor outperforms the 128 factor by 5 seconds for an input matrix of 20000 x 1000. Stats are: 32 | 128 Activity: 96% | 25% Memory(global) BW: 15182 MB/s | 11885 MB/s Kernel Clock Freq: 244 MHz | 185 MHZ Stall %: 14,49% | 15,1 % I don't know what is happening because the stall increases while the best version has better memory bw..