Forum Discussion
Altera_Forum
Honored Contributor
16 years agoIgor,
This question comes up a lot on the lwIP forums. Unfortunately it often is responded to with "lwIP is lightwieght and performance is secondary", and IMO the performance part of lwIP is neglected. It could perform *way* better. However, I spent months optimizing lwIP, Altera drivers and my code to get the performance we require for the 100MHz NIOS II we're running. Without my effort our product line for this would have been canned. Out of the box performance is poor with the Altera hardware and lwIP. I will try to order this in the order of importance:- Optimize -O3 as you saw helps a lot. As difficult as it is, I debug this way. For better debugging and not a huge hit in performance, use -O1.
- Use the inline IP header checksum in lwIP (I contributed this by the way). It helps a lot.
- Do the UDP/TCP checksum in Verilog/VHDL (in hardware). If you can't, use assembly code for inet_chksum. If you can't, use the optimized (option 3) C inet_chksum. Or, simply disable UDP checksum in LWIPOPTS.h. UDP tends to drop packets, not change bytes in packets. Running with UDP checksumming disabled will not be an issue normally.
- Replace SMEMCPY with an efficient inline memory copy.
- Do the following code/data relocations:
- Put inet_chksum in onchip RAM (if you use it)
- Put ethernetif in onchip RAM
- Put ethernetif->lwipRxPbuf in onchip RAM
- Put tse in onchip RAM
- Put tse_mac_device in onchip RAM
- Use separate memory pools and put PBUF_POOL in onchip RAM
- Put netif_list and netif_default in onchip RAM
- Put pbuf_header in onchip RAM
- Put lwip_stats in onchip RAM
- Put arp_table in onchip RAM
- Put find_entry in onchip RAM
- Put etharp_send_ip in onchip RAM
- Put etharp_find_addr in onchip RAM
- Put etharp_output in onchip RAM
- Put etharp_query in onchip RAM
- Use udp_sendto_if to send UDP packets.
- Replace memcpy with a more efficient memcpy than Altera's.
- Remove the memory copy for unaligned transfers in lwip_tse_mac.c.
- Use chained SGDMA transfers.
- Don't wait for a packet to be sent - use pbuf reference counts and delete the previously sent pbuf on the next pbuf send. This removes the wait for completion of each packet sent.
- Rewrite/refactor the SGDMA driver - it's very inefficient.
- Rewrite/refactor the TSE driver - it's very inefficient.