I think you should use some signaltap probes to see what is happening and why your masters are frozen for so long.
One possible explanation could be that you have two masters trying to access different portions of memory at the same time, and the arbitrator is letting each one access one word at a time. In that case you loose all the latency cycles from the DRAM on each access.
The master interface has some features to speed up the access, such as bursts and/or pipelined transfers with arbitrator lock. You should look into those, to let your masters access several words in a row and avoid most of the lost latency cycles.