Hi all, I am measuring ethernet performance on a NIOS II based board with an Altera TSE IP Core running uClinux, and it's been disappointing so far. I've done timing using TCP and UDP, and barely get up to 12Mbit/s. I did some timings using UDP and a 1024 byte message using a bind()ed and connect()ed socket: Total time to send message (when sendto() or write() is called until it returns): 650us Time spent in the driver (using atse.c driver, altera_tse.c driver gives similar performance): 140us Time in the driver waiting for the hardware to send data: 80us (seems it's doing 100Mbit instead of GbE, but that's a relatively small issue here considering overall performance) Kernel overhead before entering driver code: ~400us Kernel overhead after exiting driver code: ~100us Timing was done by writing to an output pin and using an osciloscope. It seems there's just a lot of overhead using the linux IP stack on the Nios II. Has anybody been able to get better performance or is it hopeless given the speed of the processor?

This already has been discussed here several times. Out of the box, Linux is versatile, but not fast with TCP/IP. NIOS is a quite slow processor. So I don't think that you will achieve much more throughput on such a system without introducing special "router" stuff like the "Zero-Copy Stack" (which needs a special Ethernet driver). -Michael

I've been doing some research on Zero-Copy stacks and haven't found any implementation for Linux that I could try to use. All I've found is sendfile() and a suggestion to use an mmapped file as a buffer as a hack to make use of sendfile() to remove the user-to-kernel copy normally done on send. I haven't been able to get this working because I get this error trying to use mmap for a size greater than a couple KB, even though malloc works fine to allocate the space: Error mmapping the file: Cannot allocate memory Regardless, memcpying a 50KB buffer takes 32us, which is still only a small part of the overhead I'm seeing, so I don't think zero-copy will help all that much. Still, I'd be happy to hear any suggestions of how to implement it for uClinux on NIOS II. So far the only way I've found to get decent bandwidth is to send large UDP datagrams.

So you found that UDP is a lot faster than TCP, and you found that copying overhead is not the major problem with TCP ? Of course UDP does impose less overhead than TCP but I did not think that this would be that important. Thanks for letting us know ! -Michael

I did mention I'm doing some tests with TCP as well, but all the above results are with udp. With UDP, there is over 500us of overhead per message, most of which by my understanding is not due to copying the buffer.

You can improve this by increasing the size of cache or reduce memory interface latency. - Hippo

Ethernet performance using TSE in uClinux | Altera Community

55 Replies

Altera_Forum
Honored Contributor
15 years ago
Do you have the same problem with UDP? Is it really corruption, or lost packets?
I had a problem a while ago with slow TCP connections on both ucOS and eCos, where a lot of TCP packets were lost, causing lost of retransmissions and a very low speed.
I found out that the problem came from the fact that the IP stack was too slow to process the received packets, the TSE FIFO became full and some packets were lost.
The workaround I found was to increase the TSE receive FIFO so that it was higher than the TCP window size, but this only works if you have only one active TCP connection. I'd be interested to know if there are better solutions around...
Altera_Forum
Honored Contributor
15 years ago
An update:

I fixed all or most of the problems in the driver that were causing TCP errors or slow downs and also added some optimizations with help from the FPGA engineer.

Now I get:
UDP TX 46 MBit/s, tested with 50x1KiB messages
TCP TX 38 MBit/s, TCP RX 54 MBit/s with 32KiB socket buffers, 30x128KiB messages

The changes:
110MHz processor with 32KiB caches.
Updated altera_tse.c. There was a fix in December to reschedule NAPI if there are more descriptors to process. I also changed it to disable IRQs again in this case, I think that is the correct behavior for NAPI.
TX and RX checksum offload. This is done with a custom FPGA component doing the checksums between the SGDMA and the MAC. Linux has some support for checksum offload, but I had to do hack out some of the checksum calls in the TCP/IP stack to get it to never waste time on them.
Scatter-gather (NETIF_F_SG) support in the driver. Checksum offload was necessary for this, Linux won't support one without the other. Unaligned transfer support on the SGDMA was also necessary.
Define NO_TX_IRQ in the driver and fix it to work without a TX interrupt.
This is actually the important part. To get the driver to correctly work with scatter gather and no TX interrupt, I had to fix it to only restart the SGDMA in one place. Since the SGDMA doesn't provide any feedback on the descriptor it last processed, and it turned out to be very difficult to correctly keep track of this, I have to walk part of the descriptor list in the transmit function to start the SGDMA in the right place. Only restarting the SGDMA in one place made it more reliable and a lock on the TX ring is no longer needed.
Cut out some error checking in the driver.

One remaining issue is that the SGDMA always preloads the next descriptor. Which I think means it will load the next descriptor for a packet that is not set up yet, but will be soon, see that it is not owned by hardware, and stop. In practice, this means it stops after sending each packet, when it may not be necessary. Unfortunately there doesn't seem to be a way to turn off this "feature".

I'm attaching a diff of my version of the driver against unstable-nios2mmu in case it's useful to someone.
altera_tse.diff.txt22 KB
Altera_Forum
Honored Contributor
15 years ago
Great !

Is this driver update for NU or noMMU ?

Hoping that the patch will reach the main stream,
-Michael
Altera_Forum
Honored Contributor
15 years ago
--- Quote Start ---
Great !

Is this driver update for NU or noMMU ?

Hoping that the patch will reach the main stream,
-Michael
--- Quote End ---

The patch is against unstable-nios2mmu (2.6.37) but I think it should work for both, but I've only tested it on my tree, which has some other modifications, such as the mentioned hacking out of checksum code. I'm sending the current version to nios2-dev, but I'll probably need to break it up and test against unstable before it can get accepted. And it looks like I'll need to update and move to device tree before I can do that.
Altera_Forum
Honored Contributor
15 years ago
--- Quote Start ---

Scatter-gather (NETIF_F_SG) support in the driver. Checksum offload was necessary for this, Linux won't support one without the other. Unaligned transfer support on the SGDMA was also necessary.

--- Quote End ---

Looks like the unaligned transfer feature in the SGDMA still doesn't actually work correctly, so I am getting some truncated packets with SG, particularly with telnet which produces lots of small unaligned data.

Forum Discussion

Ethernet performance using TSE in uClinux

55 Replies

Recent Discussions

NIOS V/m dbg_reset_out signal (Q25.1 Std, MAX10)

licensing.altera.com never worked

NiosV and juart-terminal

JTAG_UART stuck in printf

Ashling IDE scripted project creation