Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
10 years ago

Altera SGDMA & TSE

Hi all,

I am working on TSE and is using sgdma for avalon_stream_to_mem descriptor, now in my code I have four descriptors,I am receiving network packets in my local buffer from sgdma in every sgdma callback function, now the issue is that I am getting lower throughput and thus lowering my transfer speed. Can any body know how to use sgdma at its best for getting higher data rate?

20 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The IP stacks only use interrupts. Polling usually uses too much CPU resources. For higher performance some stacks have a high priority thread that is only responsible for receiving the packet after an interrupt and configuring the DMA again for the next one, while a lower priority thread does the actual processing on the packet.

    It's difficult to tell without the code but from your description it looks like you have too many processing on the CPU side that is causing the packet drop. Use a profiler to find where the bottleneck is.

    Do you have to do much software processing on the ethernet frames? If it is just simple encapsulation, you'll get a better performance by doing everything in hardware instead of going through a DMA and a software stack. I've never used USB cores, but if yours has some Avalon Stream interfaces it shouldn't be too complicated to set up.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Actually we are planning for putting all into hardware but assuming that in software we can get some good results based on that we'll just convert the logic into hardware..In returning data to USB I am taking one packet at a time from sgdma warp it. I also tried 5 packets wrapped into one usb packet and send. But the issue here is the receive packet queue gets full before forwarding all of them to USB. So I think my usb is getting slower compared to TSE and so resulting in loss of packets.

    I used profiler and high performance counter also but the execution done only once and in that nothing looks like bottleneck or any of USB or TSE is eating more cpu.

    Another option now I am trying is to poll the sgdma instead of interrupt so that packet loss can be controlled at the cost of bandwidth. Do you think is this the viable solution or putting all into hardware is the only way?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I'm not sure you'll get a significant improvement by using polling. It is true that you would spare the time lost by the CPU's handling of the interrupt, but on the other hand if you use the altera HAL do_sync_transfer() function to do polling, then the CPU can't be used to do something else (like sending the data to USB) while it waits for a DMA transfer. Another solution could be to bypass the HAL and use directly the SGDMA registers. When a packet is received, you configure the SGDMA for the next packet, enable it again, then process the received packet, send it through USB, and only then poll for the SGDMA status and wait for a new packet. That way if an Ethernet packet is received while you send the previous one to USB, you won't miss any CPU cycle. But again the gain compared to the solution with interrupt could be marginal.

    The most optimal way is to bypass the CPU completely and put everything in hardware.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi,

    After a long again need some help from you, as we are planning to move this whole to hardware by removing NIOS and creating a component that will take care of the nios part. Now i wanted to know that is it possible to use multiple descriptors in SGDMA. As the drivers given here from Altera can use and process single descriptors at a time but suppose if I want to make it to use multiple then how we can do that? And also suppose if i want to use this multiple descriptor concept in developing a hardware than how it can be done?

    Waiting for your response...
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I'm not sure how broad "move this whole to hardware" entails - but using the SGDMA is associated with a NIOS (software) implementation. If you're actually going to move everything into a hardware implementation, you may be better off not using the SGDMA and it's RAM-based descriptors. i.e. it may be simpler to write your own block to tx/rx the packet data than it is to write a block to program the descriptor RAM for the Altera SGDMA.

    If you are still using the Altera supplied InterNiche driver (ins_tse_mac.c) then they have already done the work to make the driver support multiple descriptors: search for ALTERA_TSE_SGDMA_RX_DESC_CHAIN_SIZE for example. Or if you're using your own driver at this point, review their tse_sgdma_read_init() function and see how it is looping to create a chain of descriptors.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi ted,

    As you are saying that there is already support for multiple descriptors in interniche drivers, then can you tell me how to use it for multiple descriptors and/or chain? I tried with calling altera_sgdma_stream_to_mem_descriptor() but couldn't get it done. If we increase the value stored in ALTERA_TSE_SGDMA_RX_DESC_CHAIN_SIZE to 2 or more than it will be a chain with two descriptors. Suppose if I want to use sgdma at its maximum capacity than how should I go with multiple chain or descriptor?

    Also one question is that there are so many references like ucos ii interniche stack and lwip stack and standalone sgdma software. But none of them uses multiple chain logic. Is there any specific reason for this or people generally do not prefer to use multiple chain?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    Also one question is that there are so many references like ucos ii interniche stack and lwip stack and standalone sgdma software. But none of them uses multiple chain logic. Is there any specific reason for this or people generally do not prefer to use multiple chain?

    --- Quote End ---

    I don't think any of these stacks are out-of-the-box optimized for absolute highest performance. They work well for their intended use, with modest performance on a modest processor. I think there is a general understanding that if you want to saturate one or more 1000mbps links, you aren't going to be doing that with a ~100MHz processor and software. See http://www.alterawiki.com/wiki/nios_ii_udp_offload_example

    --- Quote Start ---

    Hi ted,

    As you are saying that there is already support for multiple descriptors in interniche drivers, then can you tell me how to use it for multiple descriptors and/or chain? I tried with calling altera_sgdma_stream_to_mem_descriptor() but couldn't get it done. If we increase the value stored in ALTERA_TSE_SGDMA_RX_DESC_CHAIN_SIZE to 2 or more than it will be a chain with two descriptors. Suppose if I want to use sgdma at its maximum capacity than how should I go with multiple chain or descriptor?

    --- Quote End ---

    This is largely a repeat of the good advice Daixiwen already gave you earlier in this thread:

    Software driving SGDMA at it's maximum capacity would consist of a (very?) long circular chain of descriptors stored in dual port on-chip RAM with the other port of the RAM connected to the NIOS tightly coupled data memory interface. The software would basically consist of initializing the chain and starting the SGDMA, and then continuously process the chain looking for completed descriptors. You can either do that by foreground polling of the chain, or having the SGDMA set an interrupt on each descriptor and having your ISR process every completed descriptor each time it takes an interrupt.

    In order to do this, you will need to overcome whatever issue you ran into with your difficulty using the HAL API's altera_sgdma_stream_to_mem_descriptor() etc. as you're going beyond any of the readily available examples.

    You can get quite far with an approach like this, however at some point you will want to consider other options including the Modular SGDMA and developing custom IP for your fixed function.
  • Msg06484's avatar
    Msg06484
    Icon for New Contributor rankNew Contributor

    Hi,

    I hope it is ok to append to this chat, I am new to this forum. My question is similar but more basic so I thought this might be the place to ask.

    I have the lwip core. I have the sample design for the Cyclone IV E, found at https://github.com/adwinying/lwIP-NIOSII/tree/master/FPGA/software/lwIP_NIOS_II_Example

    I am tying to develop this for the Arria10 in Quartus 18.1

    I first instantiated the sgdma and found that the lwip does not like that...

    I then found that it does like msgdma so I instantiated that, but still there are errors. Things like - ALTERA_TSE_SGDMA_INTR_MASK and ALTERA_TSE_FIRST_RX_MSGDMA_DESC_OFST

    If I am telling the truth I also instantiated on chip memory (RAM) for the dma, and a descriptor_memory ROM.

    I am the only person in the company on this project right now. I have access to a corporate tech support at Intel, but we are stuggling.

    Does anyone know of a sample design that would be more fitting for lwip on Quartus 18 (possibly with arria10)?

    Thanks in advance.