Altera_Forum
Honored Contributor
15 years agoTightly-coupled interface to custom components
All experienced SOPC/Nios2 developers probably aware of the fact that Nios2 CPU access to Avalon-MM components is pretty slow.
Specifically, Nios2/f core with non-burst data master generates Avalon-MM read transactions (0 ws, pipeline latency= 1 clock) at max. rate of 1 transaction per 4 clocks with 2 more clocks of latency for instructions depending on load result. The same core generates Avalon-MM write transactions (also 0 ws) at max. rate of 1 transaction per 2 clocks. Nios2/f core with burst data master is slower yet - 6+2 clocks for read and 4 clocks for write. I, personally am a big fun of programmable I/O - for things that don't have to run at absolute maximum speed it is much simpler to program than DMA, less prone to strange errors (to name just one, how about cache lines tearing?), more friendly to multitasking environments and, last but not least, costs me little or nothing in terms of FPGA resources. Although, in theory, for small packets PIO, esp in write direction, could end up faster than DMA. The problem is - on Nios2 PIO is so slow that the slowness seriously reduce its applicability. The most depressing about it is the fact that there I see no good technical reasons for Nios2 PIO to be slow. Actually, Nios2/f core has near-perfect tool for fast PIO in form of tightly-coupled data port (TCM-DP). The limitations of TCM-DP protocol, specifically, point-to-point master-slave connection, zero wait states and one-clock pipeline latency look perfectly reasonable. There is only one, problem, but the lethal one - the damn SOPC builder refuses to connect TCM-DP to anything except Altera's own on-chip memory components. Writing this post more out of despair than with the hope to get a real help, but still... I'd guess, my question is: "Does somebody know how to trick SOPC builder into accepting custom Avalon-MM component on TCM-DP port without going too ugly"? Or, more generally, I want to hear what Honorable Altera Gurus think about ways of improving NIOS2/f PIO performance, especially in a more problematic case of burst data master. For example, I personally can think about "creative" use of custom instructions. But multicycle custom instructions themselves are only marginally faster than Avalon-MM access and hardware-wise they are not as cheap TCM-DP. Besides, using custom instructions for I/O doesn't agree with my senses of aesthetics. Thank you for patience, Michael