Forum Discussion
Altera_Forum
Honored Contributor
13 years agoYou need to do some thoughput analysis. I'm not sure you'll actually manage to transfer data at that rate (seems high!) - and then manage to process it somewhere before the next frame arrives.
Thinks: 4MB at 30fps is 120MB/s - about gigabit ethernet speed. Or, if you have a 120MHz clock (you won't run an fpga much faster), 1 clock per byte, or 4 clocks per 32bit word. And that is a frame average, you probably need to worry about the slightly higher mid-line pixel clock. The first thing to realise is that PCIe isn't really a bus protocol (like PCI) but much more like a communications protocol using HDLC frames. A large PCIe write is split into multiple requests each of typically (but negotiated) 128 bytes, a small number of which can be outstanding at any one time. When the target has actually written the data it sends an ack packet back the the originator - which then knows that the transfer has completed and can send the next fragment. (Actually it is a bit more complex than that!) All the state engine work (etc) slows it all down way below the nominal speed of the PCIe link itself. I don't know exactly how the Avalon PCIe master side works - we've only used the slave (master is a small ppc). In order to generate a long PCIe request, the initiating PCIe block needs to know that its user (the DMA block in you case) is going to request another cycle, for writes it could be bufferring data until the avalon burst ends (reads are much more tricky). Once it has decided it has enough data for a PCIe transfer, the PCIe transfer can be initiated. It can then look for more data for the next tranfser. Somewhere there needs to be a FIFO - to guarantee that the PCIe block can be fed data every clock (for whichever clock is relevant!). I think this all means that the DMA transfer length can be much larger than 512 bytes, but you may need add some kind of fifo between the video source and the PCIe. Possibly writing each vidoe line to alternate memory blocks (4k each ?) and transfering each in turn. (You might need 4 blocks to hangle jitter...) Some of this would all be easier if there was a dma engine inside the PCIe block (which would take a PCIe address as its target), rather than the PCIe block being an Avalon slave and mapping ranges of Avalon address space to PCIe space.