I'm not 100% sure, but I think the behaviour is pretty normal if you don't use burst reads.
Each read requires a complete command on ddr2 side (including precharge row and column selection) with its associated delays and latencies. On the other hand the writes are pipelined and no wait for ready status is required.
Refer to the timing diagrams on pages 3-20 and 3-22 of the sdram-ddr2 controller core user guide.
For the read I counted exactly 14 clock cycles between start of the read command and the assertion of ready signal.
If you enable burst tranfer you can greatly enhance the performance, since you are reading sequential addresses.