Forum Discussion
Altera_Forum
Honored Contributor
13 years ago --- Quote Start --- The CPU has to adhere to the (strict) PCI ordering rules, so when a machine instruction indicates a move from the device memory BAR, this CPU has to wait on the result, no matter what. The CPU cannot know that you are not actually needing the data right away and there wouldn’t be any side effects from doing other work while the returned data is still in flight. This can only be worked around by a DMA engine, device-local or system-global. – Matthias --- Quote End --- That is a non-sequitor. A modern cpu will execute other instructions following a PCIe memory read provided they aren't dependant on the value being read. Memory reads can be re-ordered, so other locations can be read. Any PCIe transfers must be sequenced, but that doesn't affect other operations. The 'problem' is that PCIe reads to the Altera fpga are very slow (I don't know if this is typical of PCIe slaves - I've not timed any others), so the cpu quickly runs out of instructions it can execute before the PCIe read completes. A single PCIe transfer (which is 2 (maybe more) hdlc frames) can usually contain upto 128 bytes. The transfer time is largely independant of the transfer length, and IIRC is of the order of 1-2us. This may be shorter than the interrupt latency and process reschedule. So although it may be necessary to use a DMA controller to generate the long transfer, it can make sense to synchronously wait for completion by polling the 'dma done' bit. Splitting the 'setup' from the 'wait for complete' will allow overlapping within the driver (eg processing the previous block, or getting the next block ready) without the overhead and complexity of a fully asynchronous DMA.