Forum Discussion

KAkyo's avatar
KAkyo
Icon for New Contributor rankNew Contributor
6 years ago
Solved

Using a value from the previous iteration

Hello, I am trying to implement an application on OpenCL, as a single work-item kernel. The below is code snippet and the line numbers in the report are changed to fit with the snippet. unsigned ...
  • HRZ's avatar
    6 years ago

    If you move the load on line 9 outside of the if condition, your II will be reduced to a smaller value, at the cost of higher memory traffic since the load will happen every iteration. However, since the address to load depends on dvid and dvid is incremented conditionally, the II cannot be improved much further. Another thing you can try to further reduce the II is to split your input into multiple chunks, load one chunk into a local variable, perform all the computation on that chunk using local variables, then write back the results of the whole chunk to global memory. Whether implementing this will be possible or not depends on your application. However, best case scenario, you might be able to reduce the II to ~10 this way. For such applications, NDRange will probably work better since the work-item scheduler can potentially achieve a lower average II at run-time than the fixed II of the single work-item equivalent, but at the end of the day your performance will be limited by the random global memory accesses and will be quite poor on FPGAs.