Forum Discussion
Altera_Forum
Honored Contributor
9 years ago --- Quote Start --- Hi, I have done it. But the issue is that this design carries out the instructions and only the output is muxed according to the thread id. This means that the latency for the final output increases. Any other input? --- Quote End --- Hmm I can't think of anything branching wise that would result in a lower latency, but have you tried created separate kernels and passing the data via channels?