Forum Discussion

He4Forum's avatar
He4Forum
Icon for Occasional Contributor rankOccasional Contributor
3 years ago
Solved

Issues when testing DMA feature provided by Avalon MM+ IP with S10 GX dev kit card on Ubuntu 18.04

I have installed s10 GX dev kit fpga card on Ubuntu 18.04 and program the Avalon MM+ Hard IP to the card.

While I meet some issues when testing the DMA feature.

For the test application which locates at ~/avmm_bridge_512_0_example_design/software/user/example/intel_fpga_pcie_link_test, all the test mode cannot run successfully.

For example, the first test, Doing 100 writes and 100 reads, failed. I put the results of first 10 tries below.

At dword 0x0
Wrote 0x120aed54
Read  0xffffffff
At dword 0x1
Wrote 0x4db4d210
Read  0xffffffff
At dword 0x2
Wrote 0x616cd2bb
Read  0xffffffff
At dword 0x3
Wrote 0x29ad0aff
Read  0xffffffff
At dword 0x4
Wrote 0x31f54bf7
Read  0xffffffff
At dword 0x5
Wrote 0x7f08785d
Read  0xffffffff
At dword 0x6
Wrote 0x4fde1236
Read  0xffffffff
At dword 0x7
Wrote 0x401d6525
Read  0xffffffff
At dword 0x8
Wrote 0x44a0f03b
Read  0xffffffff
At dword 0x9
Wrote 0x49ebcbdc
Read  0xffffffff

For each time testing, the result of reading from the memory is 0xffffffff. While it shows reading and writing operaions don't meet errors. Only the numbers don't match.

Number of write errors:       0
Number of read errors:        0
Number of dword mismatches: 100

While I look into the code, I do not find where the device memory address is initialized for device memory. It is just initialized to NULL.

char *addr = NULL;

I am wondering why the test application does not use memory map to get a address mapping to the FPGA card memory. I'm not sure if it is initialized in other functions and whether the address(NULL) is the reason to cause the mismatches.

However, other testing mode cannot work as well. So maybe there are caused by other reasons.

When I installed the PCIe driver for the FPGA card, as the provided driver code is for CentOS, I modified some code to make it could be successfully installed on Ubuntu. Not sure if my modifications will cause any issue.

Hope someone have ideas on the issues I've met. Thank you and look forward to reply.

  • The problem you seen might related to below KBD

    https://www.intel.com/content/www/us/en/programmable/support/support-resources/knowledge-base/ip/2019/why-does-the-intel--stratix--10-avalon--mm-interface-for-pci-exp.html


    Why does the Intel® Stratix® 10 Avalon®-MM Interface for PCIe* with DMA example design fail the link test and the DMA test when using the default selected BAR0?

    Description

    When the internal DMA Descriptor Controller is enabled, the BAR0 Avalon®-MM master is not available for general-purpose usage. The DMA Descriptor Controller uses this BAR0 interface through which the host CPU programs in the descriptor table.

    The intel_fpga_pcie_link_test user application selects BAR0 as default when it's initially executed. If the user forgets to change to BAR2, which is where the onchip memory is attached, then both the link test and the DMA test will fail.

    Resolution

    The user must change to BAR2 before executing the link test and the DMA test.

    See the execution transcript of the intel_fpga_pcie_link_test user application below for how to change to BAR2.

    ~$ sudo ./intel_fpga_pcie_link_test

    *********************************************************

    Intel FPGA PCIe Link Test

    Version 2.0

    0: Automatically select a device

    1: Manually select a device

    *********************************************************

    > 0

    Opened a handle to BAR 0 of a device with BDF 0x1300

    *********************************************************

    0: Link test - 100 writes and reads

    1: Write memory space

    2: Read memory space

    3: Write configuration space

    4: Read configuration space

    5: Change BAR

    6: Change device

    7: Enable SRIOV

    8: Do a link test for every enabled virtual function

    belonging to the current device

    9: Perform DMA

    10: Quit program

    *********************************************************

    > 5

    Changing BAR...

    Enter BAR number (-1 for none):

    > 2

    Successfully changed BAR!

29 Replies

  • skbeh's avatar
    skbeh
    Icon for Contributor rankContributor

    Good to know that the USB connection issue could be potentially caused by high temperature.

    Make sense that since the DMA test can perform up to 20k cycles, doesn't seems like it is design issue anymore.



    • He4Forum's avatar
      He4Forum
      Icon for Occasional Contributor rankOccasional Contributor

      Yep. And thank you very much for your support and patience.

  • He4Forum's avatar
    He4Forum
    Icon for Occasional Contributor rankOccasional Contributor

    Also I have tested a simply read & write operation.

        //...
        result = dev->write32(reinterpret_cast<void *>(addr), write_data);
        if (result == 1) {
            cout << "Wrote successfully!" << endl;
        } else {
            cout << "Write failed!" << endl;
        }
    
        uint32_t test_read = 0;
        result = dev->read32(reinterpret_cast<void *>(addr), &test_read);
        if (result == 1) {
            cout << "Read successfully!" << endl;
        } else {
            cout << "Read failed!" << endl;
        }
        cout << "Read number : " << test_read << endl;
        //...

    Get the wrong result again.

    > Enter address to write, in hex: 0000f000
    > Enter 32-bit data to write, in hex: 12341234
    > Writing 0x12341234 at BDF 0x1a00 BAR 0 offset 0xf000..
    Wrote successfully!
    Read successfully!
    Read number : 0xffffffff

    Read number is 0xffffffff, similar with Doing 100 writes and 100 reads test. This time the using address is manually set, not NULL.

    • skbeh's avatar
      skbeh
      Icon for Contributor rankContributor

      As I understand it, you are testing the PCIe link-up on Stratix 10 SX SoC Development Kit (DK-SOC-1SSX-L-D) as described below on Ubuntu by modifying some driver code from provided for CentOS.
      https://www.intel.com/content/www/us/en/products/details/fpga/development-kits/stratix/10-sx.html
      I'm not clear if the issue was caused by the driver code modification itelf.
      However in the first place, are you able to check if the PCIe is able to link-up? i.e. using lspci to check.
      Do you have any PCIe endpoint card plug-in into this SOC dev kit which act as rootport? or you are using the BTS (Board Test System) to test the loopback?

      • He4Forum's avatar
        He4Forum
        Icon for Occasional Contributor rankOccasional Contributor

        Hi skbeh,

        I am using Stratix 10 GX FPGA Development Kit (DK-DEV-1SGX-L-A). And I have checked that the FPGA card could be detected via PCIe.

        $ lsmod | grep intel_fpga_pcie_drv
        intel_fpga_pcie_drv    32768  2
        
        $ lspci -d 1172:000 -v
        1a:00.0 Unassigned class [ff00]: Altera Corporation Device 0000 (rev ff) (prog-if ff)
                !!! Unknown header type 7f
                Kernel driver in use: intel_fpga_pcie_drv
                Kernel modules: altera_cvp
        
        $ lspci -d 1172:000 -v | grep intel_fpga_pcie_drv
        Kernel driver in use: intel_fpga_pcie_drv

        The FPGA card is just plugged into a PCIe port on a host machine with Ubuntu 18.04 OS.

        As the README for the driver for FPGA card says,

        TESTING
        -------
        The driver was developed and tested on CentOS 7.0, 64-bit with
        3.10.514 kernel compiled for x86_64 architecture.

        and when I installed the card with no modifications, it would meet errors. Seems that the errors are caused by the Linux kernel version, 5.10 on my machine and 3.10 in README introduction. I fixed the bugs manually and got the driver successfully installed.

  • skbeh's avatar
    skbeh
    Icon for Contributor rankContributor

    There are 0-9 application tests as shown below.

    Can you confirm the USB blaster II disconnected issue only happened in "9: Perform DMA"?

    Or it happened on both '0: Link test - 100 writes and reads" and "9: Perform DMA" ?


    0: Link test - 100 writes and reads

    1: Write memory space

    2: Read memory space

    3: Write configuration space

    4: Read configuration space

    5: Change BAR

    6: Change device

    7: Enable SRIOV

    8: Do a link test for every enabled virtual function

    belonging to the current device

    9: Perform DMA


    • He4Forum's avatar
      He4Forum
      Icon for Occasional Contributor rankOccasional Contributor

      The issue is for both '0: Link test - 100 writes and reads" and "9: Perform DMA".

      When the card works well, choose '0: Link test - 100 writes and reads" and get the result like this.

      Doing 100 writes and 100 reads..
      Number of write errors:       0
      Number of read errors:        0
      Number of dword mismatches:   0

      Choose "9: Perform DMA" and get,

      *********************************************************
      Current DMA configurations
          Run Read  (card->system)  ? 1
          Run Write (system->card)  ? 1
          Run Simultaneous          ? 1
          Number of dwords/desc     : 2048
          Number of descriptors     : 128
          Total length of transfer  : 1e+03 KiB
      *********************************************************
       0: Run DMA
       1: Toggle read DMA
       2: Toggle write DMA
       3: Toggle simultaneous DMA
       4: Set the number of dwords per descriptor
       5: Set the number of descriptors per DMA
       6: Return to main menu
      *********************************************************
      

      then "0 : Run DMA",

      Enter the number of DMA operations to initiate; enter 0 for infinite loop:

      Enter "2" and then get the result.

      
      *********************************************************
      Current DMA configurations
          Run Read  (card->system)  ? 1
          Run Write (system->card)  ? 1
          Run Simultaneous          ? 1
          Number of dwords/desc     : 2048
          Number of descriptors     : 128
          Total length of transfer  : 1e+03 KiB
      
      Current run #: 2
      Current time : Wed Mar 23 13:49:54 2022
      
      DMA throughputs, in GB/s (10^9B/s)
          Current Read Throughput   :  0.01
          Average Read Throughput   :  0.01
          Current Write Throughput  :  0.01
          Average Write Throughput  :  0.01
          Current Simul Throughput  :  0.01
          Average Simul Throughput  :  0.01
      *********************************************************
      

      When the USB blaster II disconnected issue happens, both could not work.

      '0: Link test - 100 writes and reads" :

      Number of write errors:       0
      Number of read errors:        0
      Number of dword mismatches: 100

      "9: Perform DMA" :

      
      Current run #: 1
      Current time : Wed Mar 23 13:53:10 2022
      
      DMA throughputs, in GB/s (10^9B/s)
          Current Read Throughput   :  0.00
          Average Read Throughput   :   inf
          Current Write Throughput  :  0.00
          Average Write Throughput  :   inf
          Current Simul Throughput  :  0.00
          Average Simul Throughput  :   inf
      *********************************************************
      Stopping DMA run due to error..
      
      

      Meanwhile I notice that when the USB blaster II works, run the command

      $ lsusb

      and we can find the FPGA card.

      lsusb : Bus 001 Device 002: ID 09fb:6810 Altera

      when USB blaster II is disconnected, it disappears.

      • skbeh's avatar
        skbeh
        Icon for Contributor rankContributor

        I see, when USB blaster II is connected, both tests passing ('0: Link test - 100 writes and reads" and "9: Perform DMA")

        When USB blaster II disconnected, definitely the result will show as failed since the connection has loss.

        Can you identify under what condition the USB blaster II start loss its connection? While at the middle of performing the '9: Perform DMA"? Or when both test #0 and test#9 completed and you repeat another round of test?

  • skbeh's avatar
    skbeh
    Icon for Contributor rankContributor

    Please try de-select the "Enable burst capability for Avalon-MM BAR0 Master port" (HPRXM BAR0) option as shown below, then re-generate RTL, re-compile and configure the example design again.

    Let's see if disabling burst mode for BAR0 allows the DMA test to work because it looks like the HPRXM (burst mode) module and the DMA module have conflicts on BAR0 causing problem.

    Let me know your test result after this change.


    • He4Forum's avatar
      He4Forum
      Icon for Occasional Contributor rankOccasional Contributor

      Could you please resend the pic? it could not be loaded on my PC. Thanks.

      • skbeh's avatar
        skbeh
        Icon for Contributor rankContributor

        I resend the picture. Hope you get it.

    • He4Forum's avatar
      He4Forum
      Icon for Occasional Contributor rankOccasional Contributor

      Unfortunately, the same error still occurs.

      This time DMA test runs a bit longer than before. The usb is disconnected after 20,000 loops of testing.

      At the same time, during the test, I found that the previous usb was connected to an usb3.0 interface of the motherboard, so I plugged it into an usb2.0. While no difference about the result.

    • He4Forum's avatar
      He4Forum
      Icon for Occasional Contributor rankOccasional Contributor

      Atfer how much time the USB will be disconnected seems very random. Maybe I just start the test, or have runned the test for a while.