Typically I use the same clock for the controller as what I send off chip since the constraints will drive the fitter to move things around for you. If timing is still and issue mixing timing constraints with a phase shifted clock off-chip can sometimes solve that problem. In that case you are trading off either read or write timing for the other which is also what that -3.5ns phase shift that people blindly add to their designs does as well (but you have no idea if it really works without constraints).
I haven't read that document before so I can't really comment but I would think you need to set the same constraints across all the I/O for the interface. Here is another doc that I highly recommend:
http://www.alterawiki.com/wiki/timequest_user_guide Also I should mention if your SDRAM interface does not meet timing this will not affect the DMA (beside the fact that it might transfer corrupt data). So the DMA getting stuck would be caused by other things like it accessing space, or maybe the software is getting clobbered, etc...