Being able of writing-then-reading a bunch of bytes from the SDRAM is not the final probe that timing is OK. Between one access and the next (as pointer handling code should be executed) there could be a "too long" time that could mask the timing problems, as the use of a data cache would also do.
Create a new project targeted to run in your SDRAM. Compile and try to run. If you have an invalid timing issue it will show-up here, the JTAG debug module and the instruction cache put much more stress on the SDRAM avalon slave interface than your code.
Try with several delay values (compensation) in your SDRAM clock PLL.