Using a value from the previous iteration
- 6 years ago
If you move the load on line 9 outside of the if condition, your II will be reduced to a smaller value, at the cost of higher memory traffic since the load will happen every iteration. However, since the address to load depends on dvid and dvid is incremented conditionally, the II cannot be improved much further. Another thing you can try to further reduce the II is to split your input into multiple chunks, load one chunk into a local variable, perform all the computation on that chunk using local variables, then write back the results of the whole chunk to global memory. Whether implementing this will be possible or not depends on your application. However, best case scenario, you might be able to reduce the II to ~10 this way. For such applications, NDRange will probably work better since the work-item scheduler can potentially achieve a lower average II at run-time than the fixed II of the single work-item equivalent, but at the end of the day your performance will be limited by the random global memory accesses and will be quite poor on FPGAs.