Forum Discussion
Altera_Forum
Honored Contributor
11 years agoHi Jack,
I want to expand upon your last sentence. You can use Task kernel methods such as shift registers within an NDRange kernel, since a Task kernel is just a single work-item single work-group NDRange kernel. For example, if you wanted to do a moving average filter, you could perform divide and conquer on the input domain and have each work-item in a work-group process their (overlapping) subdomains using their own shift registers. You could scale your throughput easily by specifying how many work-items are in a work-group using the __attribute__ ((reqd_work_group_size(X, 1, 1))) until you consume some desired portion of your device's resources (logic, registers, memory bandwidth, etc.). I recommend giving it a try and see how it works for your problems.