Thank you for the detailed information, very helpful!!!
--- Quote Start ---
...Then when you say that you counted the cycles using the debugger, was it when stepping through C code or assembly? One step in the C code could be several assembly instructions.
If you don't have instruction or data cache you also need to take into account several clock cycles due to the DRAM latency...
--- Quote End ---
I stepped through in assembly.
So that leads to 12 instructions per while loop times 6 cycles per instructions times x cycles due to DRAM latency. (by the way: x is approximately 2.5)
I have the feeling that i basically understand the delay, thank you again.
Best regards
Stefan