I think I almost follow that!
In our case the PCIe master doesn't usually access the internal memory block that is causing us issues. It does do single word cycles into a different M9K block (tightly coupled to the other cpu), and to a small avalon slave we use as an interrupt requestor (to the ppc at the other end of the PCIe link).
The PCIe slave also does read/write to the SDRAM - these will be longer requests.
Looks like it would make sense (in any case) for us to put an explicit non-bursting clock crossing bridge between the PCIe Avalon master and all the Avalon slaves except SDRAM. The SDRAM might benefit from a bursting bridge - since the software tries quite hard to do multi-word accesses to the SDRAM.
At the moment we've seen errors on different cards with different fpga images. The problem I see is that any fpga rebuild is likely to change the resource allocation - and the whole problem looks like a marginal timing issue somewhere, so the new version only works because it doesn't use the part of the specific fpga that is close to the tolerance limits.
I can, of course, detect the specific error we are seeing in software (by doing a re-read). But there are other shared locations where that would be much more difficult. Making the M9K data blocks not 'tightly coupled' (so they use the Avalon arbiter) would also slow the code down too much.