Well, the first thing I can think of is that you should try adjusting the weights of the FPGA ports to see if that helps. Other than that, do you know how hard your custom IP is hitting the bus and whether or not you're flooding the SDRAM controller?
I've been able to do DDR to SDRAM, but definitely can get stalls if I'm pushing it a bit hard. I'm attempting to mitigate that with better interrupt interrupt handling, etc, but this sounds like something else. This sounds like it might even be some sort of spinlock issue.
Does your IP work from userspace? I develop using /dev/mem mappings to start with before hitting the kernel, but that's personal preference.