If you are a beginner, I wouldn’t recommend any device-device transfers at this time, I wouldn’t even recommend trying any DMA access now, because already the simplest operations have a high overhead of designing a complex OS driver. Most beginners start with a QSYS or SOPC builder example, or maybe SGDMA with a ready-made driver, but this will not really help in understanding the PCIe TLPs and driver mechanisms needed for inter-I/O accesses later on.
So if you really want to start with the Avalon ST interface you have to become friends with TLPs. A short overview of PCI Express and TLPs is given here (
http://billauer.co.il/blog/2011/03/pci-express-tlp-pcie-primer-tutorial-guide-1/), but there is plenty of free information on the net. Then grab a fresh pdf copy of ldd3 (
http://lwn.net/kernel/ldd3/), the Linux Device Driver Development guide.
Next take a hard IP block, configure it, keep the Avalon ST bus open and load it into the FPGA. Boot to Linux and look at what’s happening to your PCI enumeration: Try to getting used to lspci, and correlate the hard IP settings to the output of lspci with varying -vvv options.
A next step could be to use a LED on the FPGA development board and try to make it go light and dark from the operating system driver by issuing PIO write requests from the CPU. Writing and understanding such an initial driver is already much work, but try hard to bring things up this way. The Linux source tree contains quite some complex drivers, but there are small ones available as well. Lean towards a simple PCI-based driver. Don’t skip sections in LDD3, you won’t understand later parts otherwise.
The next thing to try could be a button or switch which is read by the driver, which requires your Avalon ST application to create completions, i.e. read data response packets. printk() is your friend here if you don’t want to create complete character devices in the first attempt. Properly fill transaction descriptor from the request into the completion so that the completion is correlated to the request from the CPU. Note that PIO reads are slow by nature and should be avoided wherever you want your CPU to act in a highly performant way.
A possible next step would be to map an interrupt signal to an Interrupt Service Routine (ISR) in your driver and create the appropriate TLPs or signalling. Remember to correctly issue Legacy INTA or MSI interrupts, depending on the device settings in configuration space. There are quite some interesting things available from configuration space using tl_cfg_* signals, and the Altera PCI Express User Guide will tell you how to get access to the registers.
Before digging further into DMA accesses, I highly recommend to clear your AST TX and RX paths up: Make sure you properly handly IDLE and WAIT conditions and set up your architecture in such a way that it honors the PCIe transaction ordering rules and the very special needs of Avalon ST. For the case of interrupts, you might notice a difference in handling of Legacy INTA interrupts and MSI. Additionally, if not handled properly, the MSI requested after sending out written data might overtake the data and lead to a race condition. Learn how to avoid race conditions and why the ordering rules are your friends here. Ordering rules in hardware are seconded by memory barriers in the driver (
http://www.mjmwired.net/kernel/documentation/memory-barriers.txt), so get used to them as well.
The easiest DMA operation is a DMA write access to main memory. Create a PIO register, like the one for the LED, that will be initialized by the driver to carry the address of a reserved main memory location which the PCIe device should write to. Your AST application can then create DMA write accesses to this location so that the memory location reflects your button or switch. See how the data byte has to be put at different AST TX bit positions depending on 3DW/4DW addressing and on the alignment of the given address which also controls the byte enable indication.
Vary the address alignment and try different packet data sizes. Mind the limits of the PCIe spec, i.e. not crossing 4k boundaries, properly handling maximum payload sizes etc. Note that you are only allowed to issue DMA read or write requests if you are allowed to act as a master, which is indicated in the configuration space. Note that most BIOS versions activate DMA on all devices, even if they don’t have to contribute to the boot process. So you should have additional validation like checking for NULL address, i.e. the reset state.
The next step could be to make a simple, repetitive DMA read access in a similar way: Poll a main memory location and make the LED go on and off as a bit is set or cleared by software. Wrap your head about Tags and their proper re-use, ingress completion unshuffling and timeout handling, again TLP ordering rules on RX, and of course the infamous completion credit handling. Again, try different memory read byte counts. Mind the maximum read request size, and learn to handle split read request completions.
When you have come that far, it’s time to talk about descriptors which allow indirect control of address space that can be used for data packet transfers in one or the other direction. The descriptor tables are typically configured over PIO accesses, but your AST application will use mixed read and write accesses to read the descriptors and update them after data sending or reception. One could learn how to gain transfer speed using no-snoop and relaxed ordering.
Different input stream types, addresses, or priorities will make you think about multiple DMA channels so the driver is a little off-loaded in splitting those by the hardware. Different such queues will need different interrupt and flow control handling, so this is the next challenge. Depending on the expected interrupt load, you might consider moving from MSI to MSI-X interrupts now, requiring more PIO-like registers.
At this time, you could watch out to combine two PCIe endpoints over main memory, as described as the »easiest solution« in my former post.
You see, there are multiple steps to take to learn the tasks of a PCIe endpoint, so I wouldn’t recommend jumping straight into the most sophisticated Inter-I/O architecture right away without knowing the PCIe basics.
– Matthias