Yes, that’s possible. But be warned: Not only you have to split your 4k read into maximum-payload size requests, but the completer – typically the north bridge – can split his responses again at 64 byte boundaries!
Remember that the completions for
different requests – i.e. different tag ids – can be received
out of order, but the various part completions for the
same request will be received
in order.
One of the most important aspects of high performance DMA read transfers is to perform them in an interleaved way, i.e. issue new read requests while old read requests are still pending. This way you can reach the maximum PCIe transfer rate with little CPU overhead.
Note that the PCIe hard/soft IP tells you the maximum allowed read request size in one of the PCI(e) configuration space registers that are repeatedly distributed on the tl_* signal outputs. But as a educated guess, you could choose to max at 128 bytes, so you avoid this optimization path.
Remember that once you intend to have multiple read requests pending, you should take good care of managing your read request credit, not to overrun your completion reception buffers and not to overrun the system by monopolizing the ‘bus’, i.e. the PCIe switches and the north bridge. Still my recommended reading on this subject is
http://www.xilinx.com/support/documentation/user_guides/v6_pcie_ug517.pdf, Appendix E. And, as the subject is still about request timeouts, you have to watch your requests closely and time them out properly. Handling the read request credit properly in this case is a pitfall, though.
– Matthias