Forum Discussion
Altera_Forum
Honored Contributor
9 years agoHi,
I have done it. But the issue is that this design carries out the instructions and only the output is muxed according to the thread id. This means that the latency for the final output increases. Any other input? --- Quote Start --- It's definitely possible if you're running an NDRange kernel. Look at the available work-item functions of get_global_id(uint D), get_global_size(uint D), get_local_id(uint D), get_lobal_size(uint D), etc. Once you have the thread ID numbers you can then branch off and execute your different instructions. --- Quote End ---