--- Quote Start ---
At every iteration of the 2nd loop the value from previous iteration is stored instead of creating a new local variable. Whats going wrong?
--- Quote End ---
Most C compilers, including Altera's emulator, do not actually redefine the scoped variables and keep reusing them. When you are compiling for hardware execution, however, the variable scope will be taken into account. Still, you MUST initialize all your scoped variables. In your code, the first assignment to those two variables is conditional, hence it is possible to get incorrect output if the variable is not assigned any value in the conditional statement, but gets used in the statements after that. Depending on how your algorithm works, this might never happen but still, you should probably make sure lack of initialization on those variables will never cause trouble.
--- Quote Start ---
P.S- In red i have a memory dependency, how i resolve this?
--- Quote End ---
I would guess the "total_gin" variable is implemented using Block RAMs due to its size and since access latency to Block RAMs is NOT one clock cycle, you will get load/sore dependencies. To get single-cycle accesses, you should either use a smaller buffer that can be implemented using registers, or, if your algorithm allows, convert that buffer to a shift register. If none of these can be done, switching to NDRange could help since the initiation interval (II) is adjusted at runtime by the scheduler and hence, could allow better performance compared to the the fixed II in the equivalent single work-item kernel.