Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
16 years ago

Ethernet performance using TSE in uClinux

Hi all,

I am measuring ethernet performance on a NIOS II based board with an Altera TSE IP Core running uClinux, and it's been disappointing so far. I've done timing using TCP and UDP, and barely get up to 12Mbit/s.

I did some timings using UDP and a 1024 byte message using a bind()ed and connect()ed socket:

Total time to send message (when sendto() or write() is called until it returns): 650us

Time spent in the driver (using atse.c driver, altera_tse.c driver gives similar performance): 140us

Time in the driver waiting for the hardware to send data: 80us

(seems it's doing 100Mbit instead of GbE, but that's a relatively small issue here considering overall performance)

Kernel overhead before entering driver code: ~400us

Kernel overhead after exiting driver code: ~100us

Timing was done by writing to an output pin and using an osciloscope.

It seems there's just a lot of overhead using the linux IP stack on the Nios II. Has anybody been able to get better performance or is it hopeless given the speed of the processor?

55 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Do you have the same problem with UDP? Is it really corruption, or lost packets?

    I had a problem a while ago with slow TCP connections on both ucOS and eCos, where a lot of TCP packets were lost, causing lost of retransmissions and a very low speed.

    I found out that the problem came from the fact that the IP stack was too slow to process the received packets, the TSE FIFO became full and some packets were lost.

    The workaround I found was to increase the TSE receive FIFO so that it was higher than the TCP window size, but this only works if you have only one active TCP connection. I'd be interested to know if there are better solutions around...
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    An update:

    I fixed all or most of the problems in the driver that were causing TCP errors or slow downs and also added some optimizations with help from the FPGA engineer.

    Now I get:

    UDP TX 46 MBit/s, tested with 50x1KiB messages

    TCP TX 38 MBit/s, TCP RX 54 MBit/s with 32KiB socket buffers, 30x128KiB messages

    The changes:

    • 110MHz processor with 32KiB caches.

    • Updated altera_tse.c. There was a fix in December to reschedule NAPI if there are more descriptors to process. I also changed it to disable IRQs again in this case, I think that is the correct behavior for NAPI.

    • TX and RX checksum offload. This is done with a custom FPGA component doing the checksums between the SGDMA and the MAC. Linux has some support for checksum offload, but I had to do hack out some of the checksum calls in the TCP/IP stack to get it to never waste time on them.

    • Scatter-gather (NETIF_F_SG) support in the driver. Checksum offload was necessary for this, Linux won't support one without the other. Unaligned transfer support on the SGDMA was also necessary.

    • Define NO_TX_IRQ in the driver and fix it to work without a TX interrupt.

    • This is actually the important part. To get the driver to correctly work with scatter gather and no TX interrupt, I had to fix it to only restart the SGDMA in one place. Since the SGDMA doesn't provide any feedback on the descriptor it last processed, and it turned out to be very difficult to correctly keep track of this, I have to walk part of the descriptor list in the transmit function to start the SGDMA in the right place. Only restarting the SGDMA in one place made it more reliable and a lock on the TX ring is no longer needed.

    • Cut out some error checking in the driver.

    One remaining issue is that the SGDMA always preloads the next descriptor. Which I think means it will load the next descriptor for a packet that is not set up yet, but will be soon, see that it is not owned by hardware, and stop. In practice, this means it stops after sending each packet, when it may not be necessary. Unfortunately there doesn't seem to be a way to turn off this "feature".

    I'm attaching a diff of my version of the driver against unstable-nios2mmu in case it's useful to someone.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Great !

    Is this driver update for NU or noMMU ?

    Hoping that the patch will reach the main stream,

    -Michael
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    Great !

    Is this driver update for NU or noMMU ?

    Hoping that the patch will reach the main stream,

    -Michael

    --- Quote End ---

    The patch is against unstable-nios2mmu (2.6.37) but I think it should work for both, but I've only tested it on my tree, which has some other modifications, such as the mentioned hacking out of checksum code. I'm sending the current version to nios2-dev, but I'll probably need to break it up and test against unstable before it can get accepted. And it looks like I'll need to update and move to device tree before I can do that.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    • Scatter-gather (NETIF_F_SG) support in the driver. Checksum offload was necessary for this, Linux won't support one without the other. Unaligned transfer support on the SGDMA was also necessary.

    --- Quote End ---

    Looks like the unaligned transfer feature in the SGDMA still doesn't actually work correctly, so I am getting some truncated packets with SG, particularly with telnet which produces lots of small unaligned data.