Forum Discussion

grspbr's avatar
grspbr
Icon for Occasional Contributor rankOccasional Contributor
3 years ago

MCDMA/PCIE max number of channels - how to get 256 physical channels

Hi @Wincent_Altera ,

I am recreating this question because the older one was closed. There is a problem with notifications. I am not getting notifications of replies to my posts. How can I correct this?

To get to the problem, The MCDMA/PCIE core has 11 bits of DMA channel address when using multiple logical channels on one physical interface. We planned to use 256 channels (physically) but probably only a dozen or so logical channels. However, when we try to drive it using Intel's DPDK, we seem to get corruption of the D2H DMA descriptors if we use more than 64 channels. Why are we seeing this limitation? Is it in the DPDK itself? Also, it seems we cannot use logical channels that we can map to a physical channel address. It seems that the creation of channels automatically uses sequential physical channel addresses starting from 0. Is it not possible to map logical channels to physical channels?

To answer your question, we are using Quartus 21.2 for Agilex AGFB014 (P+E Tile).

In our build settings, we have:

PCIe0 Settings/PCIe0 IP Settings/PCIe0 PCI Express/PCI Capabilities/PCIe0 Device/PF0 = 256; and PCIe0 Settings/PCIe0 IP Settings/MCDMA Settings/D2H Prefetch channels = 256; and Maximum Descriptor Fetch = 16

32 Replies

  • Wincent_Altera's avatar
    Wincent_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,


    Thanks for reaching again.


    1. Do you try to run the example design without modification ? is it behave the same ?
    2. What kind of payload size you are using to do the test
    3. Did you develop your own driver or you are using our example design driver ?
    4. Did you capture the below signals to check if the data is mismatch or not ?

    pcie_ed_tb.pcie_ed_inst.dut.dut.ast.p0_rx_st_valid_o[1:0]

    pcie_ed_tb.pcie_ed_inst.dut.dut.ast.p0_rx_st_data_o[511:0]

    pcie_ed_tb.pcie_ed_inst.dut.dut.ast.p0_rx_st_ready_i

    pcie_ed_tb.pcie_ed_inst.dut.dut.ast.p0_rx_st_hdr_o[255:0]

    pcie_ed_tb.pcie_ed_inst.dut.dut.ast.p0_rx_st_sop_o[1:0]

    pcie_ed_tb.pcie_ed_inst.dut.dut.ast.p0_tx_st_data_i[511:0]

    pcie_ed_tb.pcie_ed_inst.dut.dut.ast.p0_tx_st_eop_i[1:0]

    pcie_ed_tb.pcie_ed_inst.dut.dut.ast.p0_tx_st_sop_i[1:0]

    pcie_ed_tb.pcie_ed_inst.dut.dut.ast.p0_tx_st_ready_o

    pcie_ed_tb.pcie_ed_inst.dut.dut.ast.p0_tx_st_valid_i[1:0]

    pcie_ed_tb.pcie_ed_inst.dut.dut.ast.p0_tx_st_hdr_i[255:0]


    Looking forward to hear back from you.

    Regards,

    Wincent_Intel


    • grspbr's avatar
      grspbr
      Icon for Occasional Contributor rankOccasional Contributor

      Hi @Wincent_Altera , thanks for the reply,

      we used the example design originally without modification and it was working, with both example driver and our driver.

      Then we removed the packet generator/checker from the example to create our own design which also worked with both our driver and example design driver.

      Then we tried to modify the MCDMA/PCIE core to 256 channels from 64, and this is where we start having problems. We used payloads of 8K for the 64 channel case, and 2K for the 256 channel case.

      Why does the example design only use 64 channels? Why not 256 or the max 2K?

      We are now in the process of trying the example design customized for 256 channels. We will try to get reports from the example driver to you.

      We do monitor those signals in our design with signaltap but do not yet have a capture when it fails, but we'll keep trying.

      regards,

      Greg

      • Wincent_Altera's avatar
        Wincent_Altera
        Icon for Regular Contributor rankRegular Contributor

        Hi @grspbr ,

        If refer to the user guide, there is a 1.2. Known Issues
        Where if setting Multichannel D2H AVST configuration has stability issues when total number of D2H channels configured is greater than 256. Please consider this in your design.

        Also, if refer to 3.1.6.1. Avalon-ST 1-Port Mode

        • In the current Intel® Quartus® Prime release, the D2H Prefetch Channels follows the total number of DMA channels that you select up to 256 total channels.
        • When the total number of channels selected is greater than 256, then D2H Prefetch channels are fixed to 64.
        • The resource utilization shall increase with the number of D2H prefetch channels.
        • For details about these parameters, refer to the 4.12.2. D2H Data Mover Interface

        Hope this answer your question.

        Regards,

        Wincent_Intel

  • Wincent_Altera's avatar
    Wincent_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,

    I wish to follow up with you about this case.

    Do you have any further questions on this matter ?

    ​​​​​​​Else I would like to have your permission to close this forum ticket

    Regards,

    Wincent_Intel


    • grspbr's avatar
      grspbr
      Icon for Occasional Contributor rankOccasional Contributor

      Hi Wincnet, I still need to review your last response. I should have something tomorrow.

      Regards,

      Greg

  • Wincent_Altera's avatar
    Wincent_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,

    I wish to follow up with you about this IPS case.

    Hoping to hear back from you so that we can proceed for next step.

    Regards,

    Wincent_Intel


  • Wincent_Altera's avatar
    Wincent_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi Greg,


    Thanks for the .qar file and evidence.

    I had rise an internal engineering ticket for this.

    The related team will work on it, will follow up closely, let you know if there is any update.


    Regards,

    Wei Chuan


  • Wincent_Altera's avatar
    Wincent_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi Greg,


    We might need perfq_log_20230106_143546.txt and 22.4 ED qar file.

    can you put it in google drive and share me the link ?


    Regards,

    Wei Chuan


    • Wincent_Altera's avatar
      Wincent_Altera
      Icon for Regular Contributor rankRegular Contributor

      Hi Greg,

      I can access to your link.
      Can you please attach perfq_log_20230106_143546.txt for Quartus v22.4 as well ?
      with 256 channel fail log.

      Regards,

      Wincent_Intel.

      • grspbr's avatar
        grspbr
        Icon for Occasional Contributor rankOccasional Contributor

        Hi Wincent,

        Unfortunately, I don't have the compiled project anymore for v22.4. But it behaved exactly the same as v21.2. It was an intel example project that I just changed the settings for 256 channel.

        Regards,

        Greg

  • Wincent_Altera's avatar
    Wincent_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi Greg,


    I have tested 22.4 acs branch and 22.4 acs Qshell using design example from Quartus with given command, and not observing any issue.

    Please find bitstream and release software in attachment. Also confirm are you using DCA enabled ZTE sof? if so this Quartus build won't support DCA feature.


    Test output:

    [root@BAPVECISE040T perfq]# ./build/mcdma-test -- -b 0000:01:00.0 -p 8192 -l 2 -z -c 64 -a 1 -d 4

    EAL: Detected 16 lcore(s)

    EAL: Detected 1 NUMA nodes

    EAL: Multi-process socket /var/run/dpdk/rte/mp_socket

    EAL: Selected IOVA mode 'PA'

    EAL: Probing VFIO support...

    EAL: PCI device 0000:01:00.0 on NUMA socket -1

    EAL: Invalid NUMA socket, default to 0

    EAL: probe driver: 1172:0 net_mcdma

    PMD: ifc_mcdma_get_hw_version(): MCDMA RTL VERSION : 0x50001

    PMD: ifc_mcdma_get_device_caps(): Max Supported by DPDK: 1024

    EAL: PCI device 0000:04:00.0 on NUMA socket -1

    EAL: Invalid NUMA socket, default to 0

    EAL: probe driver: 8086:15f3 net_igc

    EAL: PCI device 0000:05:00.0 on NUMA socket -1

    EAL: Invalid NUMA socket, default to 0

    EAL: probe driver: 8086:15f3 net_igc

    Allocating 64 Channels...


    ----------------------------------------------------

    BDF: 0000:01:00.0

    Channels Allocated: 64

    QDepth 508

    Number of pages: 8

    Completion mode: WB

    H2D Payload Size per descriptor: 8192 Bytes

    D2H Payload Size per descriptor: 8192 Bytes

    H2D SOF on descriptor: 1

    H2D EOF on descriptor: 1

    H2D File Size: 8192 Bytes

    D2H SOF on descriptor: 1

    D2H EOF on descriptor: 1

    D2H File Size: 8192 Bytes

    PKG Gen Files: 1

    File Size: 8192 Bytes

    Tx Batch Size: 127 Descriptors

    Rx Batch Size: 127 Descriptors

    TID FIFO Checks: OFF

    AVST Example Design Interface

    DCA: OFF

    ----------------------------------------------------------

    Thread initialization in progress ...

    Thread is in READY state...

    Thread initialization done

    All Threads exited

    TIME OUT while waiting for completions

    Leaving...

    -------------------------------------OUTPUT SUMMARY------------------------------------------

    Dir #queues Time_elpsd B_trnsfrd TBW d_drop_cnt Passed Failed %passed

    Tx 64 00:03:457 26185368.00KB 17.11GBPS 0 64 0 100.00%

    Rx 64 00:03:457 24388064.00KB 09.46GBPS 0 64 0 100.00%

    ----------------------------------------------------------------------------------------------

    ---------------------------------------------------------------------------------------------

    ---------------------------------------------------------------------------------------------

    Total Bandwidth: 26.57GBPS, 3.48MPPS

    Total TX Bandwidth: 17.11GBPS, 2.24MPPS

    Total RX Bandwidth: 9.46GBPS, 1.24MPPS

    Total data drop count :0

    ---------------------------------------------------------------------------------------------

    Full Forms:

    TBW: Total Bandwidth

    IBW: Interval Bandwidth

    MIBW: Mean Interval Bandwidth

    HIBW: Highest Interval Bandwidth

    LIBW: Lowest Interval Bandwidth

    Please refer to perfq_log_20230125_120711.txt for more details

    total_drops:0

    [root@BAPVECISE040T perfq]# ./build/mcdma-test -- -b 0000:01:00.0 -p 8192 -l 2 -z -c 65 -a 1 -d 4

    EAL: Detected 16 lcore(s)

    EAL: Detected 1 NUMA nodes

    EAL: Multi-process socket /var/run/dpdk/rte/mp_socket

    EAL: Selected IOVA mode 'PA'

    EAL: Probing VFIO support...

    EAL: PCI device 0000:01:00.0 on NUMA socket -1

    EAL: Invalid NUMA socket, default to 0

    EAL: probe driver: 1172:0 net_mcdma

    PMD: ifc_mcdma_get_hw_version(): MCDMA RTL VERSION : 0x50001

    PMD: ifc_mcdma_get_device_caps(): Max Supported by DPDK: 1024

    EAL: PCI device 0000:04:00.0 on NUMA socket -1

    EAL: Invalid NUMA socket, default to 0

    EAL: probe driver: 8086:15f3 net_igc

    EAL: PCI device 0000:05:00.0 on NUMA socket -1

    EAL: Invalid NUMA socket, default to 0

    EAL: probe driver: 8086:15f3 net_igc

    Allocating 65 Channels...


    ----------------------------------------------------

    BDF: 0000:01:00.0

    Channels Allocated: 65

    QDepth 508

    Number of pages: 8

    Completion mode: WB

    H2D Payload Size per descriptor: 8192 Bytes

    D2H Payload Size per descriptor: 8192 Bytes

    H2D SOF on descriptor: 1

    H2D EOF on descriptor: 1

    H2D File Size: 8192 Bytes

    D2H SOF on descriptor: 1

    D2H EOF on descriptor: 1

    D2H File Size: 8192 Bytes

    PKG Gen Files: 1

    File Size: 8192 Bytes

    Tx Batch Size: 127 Descriptors

    Rx Batch Size: 127 Descriptors

    TID FIFO Checks: OFF

    AVST Example Design Interface

    DCA: OFF

    ----------------------------------------------------------

    Thread initialization in progress ...

    Thread is in READY state...

    Thread initialization done

    All Threads exited

    TIME OUT while waiting for completions

    Leaving...

    -------------------------------------OUTPUT SUMMARY------------------------------------------

    Dir #queues Time_elpsd B_trnsfrd TBW d_drop_cnt Passed Failed %passed

    Tx 65 00:03:617 28762960.00KB 16.93GBPS 0 65 0 100.00%

    Rx 65 00:03:617 25945592.00KB 09.44GBPS 0 65 0 100.00%

    ----------------------------------------------------------------------------------------------

    ---------------------------------------------------------------------------------------------

    ---------------------------------------------------------------------------------------------

    Total Bandwidth: 26.38GBPS, 3.46MPPS

    Total TX Bandwidth: 16.93GBPS, 2.22MPPS

    Total RX Bandwidth: 9.44GBPS, 1.24MPPS

    Total data drop count :0

    ---------------------------------------------------------------------------------------------

    Full Forms:

    TBW: Total Bandwidth

    IBW: Interval Bandwidth

    MIBW: Mean Interval Bandwidth

    HIBW: Highest Interval Bandwidth

    LIBW: Lowest Interval Bandwidth

    Please refer to perfq_log_20230125_120720.txt for more details

    total_drops:0

    [root@BAPVECISE040T perfq]# ./build/mcdma-test -- -b 0000:01:00.0 -p 256 -l 2 -z -c 65 -a 1 -d 4

    EAL: Detected 16 lcore(s)

    EAL: Detected 1 NUMA nodes

    EAL: Multi-process socket /var/run/dpdk/rte/mp_socket

    EAL: Selected IOVA mode 'PA'

    EAL: Probing VFIO support...

    EAL: PCI device 0000:01:00.0 on NUMA socket -1

    EAL: Invalid NUMA socket, default to 0

    EAL: probe driver: 1172:0 net_mcdma

    PMD: ifc_mcdma_get_hw_version(): MCDMA RTL VERSION : 0x50001

    PMD: ifc_mcdma_get_device_caps(): Max Supported by DPDK: 1024

    EAL: PCI device 0000:04:00.0 on NUMA socket -1

    EAL: Invalid NUMA socket, default to 0

    EAL: probe driver: 8086:15f3 net_igc

    EAL: PCI device 0000:05:00.0 on NUMA socket -1

    EAL: Invalid NUMA socket, default to 0

    EAL: probe driver: 8086:15f3 net_igc

    Allocating 65 Channels...


    ----------------------------------------------------

    BDF: 0000:01:00.0

    Channels Allocated: 65

    QDepth 508

    Number of pages: 8

    Completion mode: WB

    H2D Payload Size per descriptor: 256 Bytes

    D2H Payload Size per descriptor: 256 Bytes

    H2D SOF on descriptor: 1

    H2D EOF on descriptor: 1

    H2D File Size: 256 Bytes

    D2H SOF on descriptor: 1

    D2H EOF on descriptor: 1

    D2H File Size: 256 Bytes

    PKG Gen Files: 1

    File Size: 256 Bytes

    Tx Batch Size: 127 Descriptors

    Rx Batch Size: 127 Descriptors

    TID FIFO Checks: OFF

    AVST Example Design Interface

    DCA: OFF

    ----------------------------------------------------------

    Thread initialization in progress ...

    Thread is in READY state...

    Thread initialization done

    All Threads exited

    TIME OUT while waiting for completions

    Leaving...

    -------------------------------------OUTPUT SUMMARY------------------------------------------

    Dir #queues Time_elpsd B_trnsfrd TBW d_drop_cnt Passed Failed %passed

    Tx 65 00:03:902 15322550.00KB 07.68GBPS 0 65 0 100.00%

    Rx 65 00:03:902 1771967.50KB 00.58GBPS 0 65 0 100.00%

    ----------------------------------------------------------------------------------------------

    ---------------------------------------------------------------------------------------------

    ---------------------------------------------------------------------------------------------

    Total Bandwidth: 8.26GBPS, 34.65MPPS

    Total TX Bandwidth: 7.68GBPS, 32.21MPPS

    Total RX Bandwidth: 0.58GBPS, 2.44MPPS

    Total data drop count :0

    ---------------------------------------------------------------------------------------------

    Full Forms:

    TBW: Total Bandwidth

    IBW: Interval Bandwidth

    MIBW: Mean Interval Bandwidth

    HIBW: Highest Interval Bandwidth

    LIBW: Lowest Interval Bandwidth

    Please refer to perfq_log_20230125_120734.txt for more details

    total_drops:0

    [root@BAPVECISE040T perfq]# ./build/mcdma-test -- -b 0000:01:00.0 -p 256 -l 2 -z -c 256 -a 1 -d 4

    EAL: Detected 16 lcore(s)

    EAL: Detected 1 NUMA nodes

    EAL: Multi-process socket /var/run/dpdk/rte/mp_socket

    EAL: Selected IOVA mode 'PA'

    EAL: Probing VFIO support...

    EAL: PCI device 0000:01:00.0 on NUMA socket -1

    EAL: Invalid NUMA socket, default to 0

    EAL: probe driver: 1172:0 net_mcdma

    PMD: ifc_mcdma_get_hw_version(): MCDMA RTL VERSION : 0x50001

    PMD: ifc_mcdma_get_device_caps(): Max Supported by DPDK: 1024

    EAL: PCI device 0000:04:00.0 on NUMA socket -1

    EAL: Invalid NUMA socket, default to 0

    EAL: probe driver: 8086:15f3 net_igc

    EAL: PCI device 0000:05:00.0 on NUMA socket -1

    EAL: Invalid NUMA socket, default to 0

    EAL: probe driver: 8086:15f3 net_igc

    Allocating 256 Channels...


    ----------------------------------------------------

    BDF: 0000:01:00.0

    Channels Allocated: 256

    QDepth 508

    Number of pages: 8

    Completion mode: WB

    H2D Payload Size per descriptor: 256 Bytes

    D2H Payload Size per descriptor: 256 Bytes

    H2D SOF on descriptor: 1

    H2D EOF on descriptor: 1

    H2D File Size: 256 Bytes

    D2H SOF on descriptor: 1

    D2H EOF on descriptor: 1

    D2H File Size: 256 Bytes

    PKG Gen Files: 1

    File Size: 256 Bytes

    Tx Batch Size: 127 Descriptors

    Rx Batch Size: 127 Descriptors

    TID FIFO Checks: OFF

    AVST Example Design Interface

    DCA: OFF

    ----------------------------------------------------------

    Thread initialization in progress ...

    Thread is in READY state...

    Thread initialization done

    All Threads exited

    TIME OUT while waiting for completions

    Leaving...

    -------------------------------------OUTPUT SUMMARY------------------------------------------

    Dir #queues Time_elpsd B_trnsfrd TBW d_drop_cnt Passed Failed %passed

    Tx 256 00:03:976 8357743.00KB 04.02GBPS 0 256 0 100.00%

    Rx 256 00:03:976 6494101.75KB 02.08GBPS 0 256 0 100.00%

    ----------------------------------------------------------------------------------------------

    ---------------------------------------------------------------------------------------------

    ---------------------------------------------------------------------------------------------

    Total Bandwidth: 6.10GBPS, 25.57MPPS

    Total TX Bandwidth: 4.02GBPS, 16.86MPPS

    Total RX Bandwidth: 2.08GBPS, 8.71MPPS

    Total data drop count :0

    ---------------------------------------------------------------------------------------------

    Full Forms:

    TBW: Total Bandwidth

    IBW: Interval Bandwidth

    MIBW: Mean Interval Bandwidth

    HIBW: Highest Interval Bandwidth

    LIBW: Lowest Interval Bandwidth

    Please refer to perfq_log_20230125_120746.txt for more details

    total_drops:0


    [root@BAPVECISE040T perfq]#



  • Wincent_Altera's avatar
    Wincent_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,

    I wish to follow up with you about this case.

    Do you have any further questions on this matter ?

    ​​​​​​​Else I would like to have your permission to close this forum ticket

    Regards,

    Wincent_Intel


  • Wincent_Altera's avatar
    Wincent_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi

    We have not hear from you and this Case is idling. It is not recommended to idle for too long.

    Therefore following our support policy, I have to put this case in close status. My apologies if any inconvenience cause

    Hence, This thread will be transitioned to community support.

    If you have a new question, feel free to open a new thread to get support from Intel experts.

    Otherwise, the community users will continue to help you on this thread. Thank you

    If you feel your support experience was less than a 9 or 10,

    please allow me to correct it before closing or let me know the cause so that I may improve your future support experience.

    Regards,

    Wincent_Intel


  • grspbr's avatar
    grspbr
    Icon for Occasional Contributor rankOccasional Contributor

    Hi @Wincent_Altera

    I'm sorry for the long delay but as we went with a workaround which was to use 64 channels, we were able to move forward. Still we will try again with project you sent when time permits.

    I have looked at the package you sent and I have a couple of notes: 1) contains Stratix 10 target whereas we use Agilex, so IP had to be upgraded, which was successful; 2) Settings did not include CVP which is how we load the design - presumably not a problem; 3) We need to provide PCIE pinout to try in our eval board; 4) Design does indeed use 256 channels and I noted that the test report indicated it was 100% successful when configured for more than 64 channels, up to 256 channels - we will try again from our end when time permits and open another thread if necessary.

    Thank you for your support. You were very helpful.

    Regards,

    Greg