Forum Discussion
Altera_Forum
Honored Contributor
14 years agoHi Dave,
I don’t know which interface you’re gonna use. If you are going for Avalon ST or another transaction-level interface with DMA, you sure will need an in-depth knowledge about inbound completion credit calculation. There is a well-written document from Xilinx which covers some possible algorithms in perfect detail, see the virtex-6 pcie user guide (http://www.alteraforum.com/forum/www.xilinx.com/support/documentation/user.../v6_pcie_ug517.pdf), Appendix E. There are at least two pitfalls I noticed when developing with Altera on the transaction-level AST interface. The first is that there is a signal rx_st_mask<n> used to indicate that your logic is not capable of receiving any more non-posted requests, like PIO Read requests from the CPU. There are two sad things about the specific operation: You must accept up to 14 (AST 64 bit) or even 26 (AST 128 bit) more non-posted requests once this signal was asserted. Together with the requirement to not hold incoming completions and posted requests just because of a busy read completion operation, you cannot simply de-assert rx_st_ready<n> – remember, there are transaction ordering rules in PCIe. You are asking for trouble in form of deadlocks if you refuse to receive incoming transactions just because you have an outbound completion (for a read request) blocking your RX port. End result: You need a dedicated fifo on the RX port capable of holding at least 14 non-posted requests (64 bit interface assumed) – even better, make it 16 or 20 so that you don’t trigger rx_st_mask<n> right away when the first non-posted request is received. This is different from, e.g. Xilinx, where this part of the buffering and transaction reordering is done by the IP (see Table 2–13, signal trn_rnp_ok_n). De-assert rx_st_ready<n> only for those times when your internal processing (not PCIe TX related) doesn’t allow any more data, like a full received completion data buffer or a full received PIO posted write data buffer. The second topic you have to keep in mind when designing for Altera PCIe: Any outbound transaction must be maintained at line rate and you have to be prepared to stream the whole transaction to PCIe at once. While there is tx_st_valid<n> which suggests (from the Avalon-ST spec) that you can insert wait states into the data stream at will, the signal must stay asserted between tx_st_sop<n> and tx_st_eop<n> (while the IP is ready by asserting tx_st_ready<n>), you are not allowed to de-assert it just because you cannot supply the data at full rate and have to wait for it. Again, this is different from Xilinx where you can choose to use such a streamed mode of operation (trn_tstr_n='0') or use an IP-level buffer (see Table 2–12 in the above mentioned document). Bottom line: Either design your data source to supply data at full rate, or add an explicit transaction fifo that starts to transmit transactions to the IP only when they are completely written to the fifo. Side note: This comparison with the competitor is not meant as an advertisement or as a list of all differences between the different IP core interfaces – there are significantly more – but to point out the major pitfalls where the designer’s assumption about the IP core might not match the actual implementation, and the Altera UG for PCIe wording might be interpreted wrongly at first reading. One thing that is still not guided by Altera correctly, is the completion timeout mechanism. PCIe requires the application to perform the completion timeout which means that any outbound posted request – i.e. DMA read request issued by the application – which does not receive any or enough completion data within 50 μs to 50 ms (PCIe suggests to not timeout quicker than 10 ms), must abort or retry the operation and indicate a fatal or non-fatal completion timeout error on cpl_err[1] or cpl_err[0], respectively. If you wonder how this is done in the Chaining DMA design example – stop wondering, it is actually not implemented :(. Even more, the IP core claims that it handles unexpected completions properly, especially if “[…] The completion packet has a tag that does not match an outstanding request.” (ref: Table 12–4, page 12–4 of the current UG). I would like to ask Altera how they think the IP knows which transactions are outstanding if the application is responsible for invalidating requests based on the timeout mechanism. At the end of the day, the application has to perform completion filtering by itself rendering this IP automatism useless or even wrong. – Matthias