We did figure out the problem that we were experiencing. It ended up being a logic bug in the handling of MSI-X TLPs being sent to the PCIe IP core. The Sandy-Bridge motherboard would use different DW alignment, and an empty flag was not being properly set for MSI-X TLPs. In our other systems, the bug would not show itself since the addressing (32-bit vs. 64-bit for the Sandy Bridge) didn't call for the flag.
I am not sure that my experience is similar to what you are currently seeing. I found that once the core is fed a badly formed TLP from the logic, it completely breaks down and will transmit garbage. It is at least worthwhile to verify, but since you were able to correct your performance via software, it is not likely the same issue.
Good luck! There are many others on these forums that are vastly more knowledgeable about PCIe and Linux that should be able to chime in and possibly help.
Jay