Altera_Forum
Honored Contributor
12 years agoHow does workgroup size impact the kernel performance ?
Hello all,
I have implemented on my Nallatech PCIe_385nA7 board a design with one compute unit pipeline . I want to experiment how the workgroup size will impact the performance .My first idea was that increasing workgroup size using "__attribute((reqd_work_group_size (WKG_HOR_SIZE, WKG_VER_SIZE, 1))) " will increase the performance because work-items are mapped on the device with the granularity of a workgroup. For example, if I had 50 workgroups, each of them will be mapped sequentially on the compute unit . That is to say : - for a workgroup size 1x1 in a NDRange of 20x20, 400 workgroups will be sequentially mapped. Low performance because Pipeline is certainly not filled. - for a workgroup size 10x10 in a NDRange of 20x20, 4 workgroups will be sequentially mapped. Good performance because Pipeline is more filled than for 1x1 I validate my reasoning on AMD GPU .But for the ALTERA FPGA, performance are not increasing when I increase the workgroup size. Wkg 1x1 --> 33 ms Wkg 64x4 --> 33 ms Wkg 256x4 --> 38 ms Can someone tell me what is wrong with my reasoning please ? Or is it an ALTERA OpenCL issue ? Thanks !