Forum Discussion
Altera_Forum
Honored Contributor
8 years agoSo, by default, the compute unit is one, is that mean when I use local group size 64x64, FPGA load 64x64 work items, but won't execute at same time, they have been execute one after another, similar to a for loop, but randomly?
Does this pipeline execute work item when previous work item have finished? or they will be execute partially overlapping depends on algorithm just like pipelined a for loop? and I am confuse about SIMD and compute unit. I have read best practice guide. I know when set compute unit to 2, compiler will duplicate 2 kernel, so hardware memory will double. So if I want to increase parallelism, I can duplicate kernel to upper limit of FPGA. what is different between SIMD and compute unit? Did SIMD also increase hardware memory? what if I increase SIMD too much?