Forum Discussion
Altera_Forum
Honored Contributor
13 years agoThe underlying problem is that PCIe isn't a 'bus' protocol, but an HDLC comms protocol. So a read/write is two hdlc packets one carrying the request and the other the response (+ the ones that generate credit). Each request can transfer a reasonable number of bytes (probably 128 or 256) - so while the acheivable throughput is high so is the latency.
With PIO requests the reads are synchronous, so you'll almost definitely have separate requests fTLPs) or every 32bit transfer, and the transfers wont overlap. The writes fair a lot better, the requests can be performed asynchronously - so will overlap. The only way to get reasonable throughput is to generate TLP that request larger data blocks. Typically this requires that you use a dma engine that is tightly coupled with the PCIe master logic. Since your test repeatedly accesses the same location you are forcing small transfers be used - even if the master is capable of merging the requests (it might for writes - but that might require use of write-combining instructions). I had to write a driver (well code to drive!) for the PCIe dma engine on the little ppc we run linux on. Also, if you need to access a FIFO (rather than a memory block) you probably want to alias the fifo to a few kb of address space so that long TLP can be used to access it.