Simulating the design would make debug much easier since you'll be able to see what the master is doing when it doesn't respond with the control_done going high.
Some things off the top of my head:
1) Make sure your start address is aligned, so if you use a 32-bit master then the start address needs to be aligned to a four byte boundary
2) Make sure the transfer length is a multiple of the data width, so for a 32-bit master lengths of 4, 8, 12, 16, etc... are valid
3) Never start the master with a transfer length of 0, I have no clue what will happen if you do
4) Never assert control_go when control_done is low
5) Never start the master at an address that doesn't exist in the system or use a transfer length so long that the master eventually accesses memory locations that do not exist.
6) Never use the early_control_done signal to determine when it's safe to startup the read master, use control_done for that.