--- Quote Start ---
Most C compilers, including Altera's emulator, do not actually redefine the scoped variables and keep reusing them. When you are compiling for hardware execution, however, the variable scope will be taken into account. Still, you MUST initialize all your scoped variables. In your code, the first assignment to those two variables is conditional, hence it is possible to get incorrect output if the variable is not assigned any value in the conditional statement, but gets used in the statements after that. Depending on how your algorithm works, this might never happen but still, you should probably make sure lack of initialization on those variables will never cause trouble.
I would guess the "total_gin" variable is implemented using Block RAMs due to its size and since access latency to Block RAMs is NOT one clock cycle, you will get load/sore dependencies. To get single-cycle accesses, you should either use a smaller buffer that can be implemented using registers, or, if your algorithm allows, convert that buffer to a shift register. If none of these can be done, switching to NDRange could help since the initiation interval (II) is adjusted at runtime by the scheduler and hence, could allow better performance compared to the the fixed II in the equivalent single work-item kernel.
--- Quote End ---
Brilliant mate, i didnt know that the emulator would not redefine the variable inside the loop.
For the total_gin i have this info:
Stall-free
Yes
Loads from
total_gin
Start-Cycle
1
Latency
3
How can i implement a shfit register in OpenCL using the total_gin info?