Does this issue happen in simulation? If so are you waiting for the calibration phase to complete (it takes around 250us if I remember correctly). So if the controller is still calibrating a few read commands will be allowed in then waitrequest will assert for around 250us and then finally the reads will return.
If you are looking for similar functionality as the master templates I recommend using the master components in the Qsys tutorial example design. The interface to control them is a little different (ST sink that accepts transfer requests) but they should work a lot better. Similarily the modular SGDMA up on the wiki would be another alternative if you need more features in the master hardware.