Hi dsl,
Thank you for your reply!
We do not have a PCIE analyzer available to check the TLP length on wire, let me double check if there is some registers availble for this statistics.
The throughput of DMA from mem to mem is about 400MB/s, so we can remove the memory from the suspected bottleneck sources.
If we try to initiate the DMA test from the RP, we have the following numbers: RP read 147MB/s, RP write 174MB/s
There are two things that I can think of:
1/ The e1000e (as the EP device) has some better approaches to use its Read DMA to fetch the data from the RP, so that it gets much better throughput to sustain the Gigabit ethernet interface.
2/ This low throughput is a hardware constraint on the EP board. The PCIE core of the EP board support 2 outbound read request outstanding.