Forum Discussion

HSuh01's avatar
HSuh01
Icon for New Contributor rankNew Contributor
6 years ago
Solved

Our host PC can't discover Stratix 10 MX devkit as a PCIe device, even after a successful configuration and a proper installation of Linux driver.

We are trying to use Intel's DMA IP and PCIe Hard IP+ to communicate to the HBM memory. So far, we've tested AN881 design, AVMM PCIe Hard IP+ example design, and the PCIe golden design that comes with the board.

None of them was successful, and we can't discover our FPGA board as a PCIe device on the Linux server (host PC) that we are using.

the driver was compiled with GCC compiler (gcc version 4.8.5 20150623) that is used to compile our Redhat Linux kernel (3.10.0-1062.9.1.el7.x86_64)

We verified that the driver is installed and correctly loaded by typing "lsmod"

>> intel_fpga_pcie_drv 22470 0

<< Environmental info >>

Quartus version: 19.2

Design used: AN881 design example, PCIe Hard IP+ design example

FPGA power source: 8-Pin from ATX power supply unit

Configuration scheme used: JTAG via FPGA Blaster II (USB)

*FPGA board was enumerated correctly after we configure the bitstream for PCIe communication. We kept FPGA board powered on so it can retain the configuration after the host PC's reboot.

So far, even with the successful configuration and a proper driver installation, we couldn't discover our FPGA board as a PCIe device on the host PC.

The FPGA devkit is sitting on the PCIe slot 3, but it was not there when we searched PCIe devices by "lspci" command. It should be shown as "In use", not "Available." (since it means the slot is empty.)

Also, we can see some devices are power failing when we boot up the host PC.

Here are the device lists that are power failing. We searched for them by bus number.

Also, we tested a golden design file that comes with the devkit. The design itself is working and we can verify it through the Board Test System(BTS) utility on windows. However, we can't still discover the board in the PCIe device list even with this golden design.

Is there any specific way of setting FPGA board, so that it can be recognized as a proper PCIe device on the host PC?

Any input will be welcomed. Thank you.

  • Hi Hsuh,

    The initial post was mention about AN811 but now become AN881. So I just curious if you already tested the AN881.

    1. For the AN881 example design, it is using v19.1 pro. Did you run the Post-processing Script (section 2.2) if you re-generate the Platform designer or upgrade the design?
    2. Is your MX board consist of the following device? 1SM21BHU2F53E1VG
    3. Are you using the driver that comes with the AN881 example design?
    4. Do you have a chance to validate it on the CentOS 7.0 and see if there is any OS dependency?
    5. Could you please capture ltssmstate[5:0], currentspeed[1:0], lane_act[4:0] and link_up signal from Signaltap? This can help to confirm if the link training is up correctly.

    Regards -SK

15 Replies

  • SengKok_L_Intel's avatar
    SengKok_L_Intel
    Icon for Regular Contributor rankRegular Contributor

    Hi ,

    The PCIe is failed to link up at slot 4. This is why the host can't detect it.

    Here is the result that I can see from your post:

    Slot 1: Link up with Gen3 x 4

    Slot 2: Link up with Gen3 x 4

    Slot 3: Failed to link up

    Slot 4: Failed to link up

    After the program the sof file to the FPGA, when the host performs a warm reboot, the perst will trigger by the host, and then it will reset the FPGA and restart the link training. When you capture the signal tap, please set the trigger condition as the rising edge of "pin_perst" signal. This can help to confirm if the FPGA gets reset properly during warm reboot.

    Regards -SK

  • HSuh01's avatar
    HSuh01
    Icon for New Contributor rankNew Contributor

    I got the results. But I am worried about these results because it clearly shows that the design is not working properly.

    Formerly, when I posted former result, that came from slot 1 ~ slot 4 by using Signal Tap, I thought at least slot 1 and 2 is working as you specified in your last answer.

    However, It turns out that those results were not correct. When I did those tests, I reboot host PC to enumerate FPGA board, I didn't do complete power-off and on.

    Today, while I was performing test, I powered off host PC completely and turned it on again for enumeration. I found that all 4 slots (PCIe slot 1 through 4) is not working.

    Please see the attached screenshots for the results I got.

    1.Right after the PC is powered on, it's in gen1 x16 state and ltssm shows 00h.

    2. After a few second, state changes but still PCIe link is not up.

    3. Also, sometimes, our PC does not boot up. It just stays on the black screen. Nothing special or different could be found in Signal Tap status when this happens (the same as above two screenshots).

    The weird thing is, when I stop data acquisition in the Signal Tap, it shows ltssm = 11h, currentspeed = 03h, lane_act = 10h and link_up = 1. Which are clearly what we want to see.

    This occurs randomly.

    Also, I have a request for you. I have a very good reason to believe that PCIe HARD IP+ is not working in our environmental settings (Quartus 19.2 with a device variant 1SM21BHU2F53E2VGS1). When I synthesize PCIe HARD IP+, I am getting multiple timing errors in the main IP itself, not in the interconnect between modules and IPs.

    Could you test this design, AN881, on the ES variant of Stratix 10 MX(1SM21BHU2F53E2VGS1)? Please do not test this design on the 1SM21BHU2F53E2VG (which was clearly used in the development stage only for the internal use at Intel).

    If you could successfully run the this design, please let me know what Linux distro, Quartus version, and Linux driver was used. I will try to replicate it on my side. If you can share me the bitstream that is used for FPGA configuration(.sof file) that will be greatly helpful.

  • SengKok_L_Intel's avatar
    SengKok_L_Intel
    Icon for Regular Contributor rankRegular Contributor

    Hi

    We tested the design by using 1SM21BHU2F53E1VG in the past, and it can link up properly. This is a production device that same with the device in this link: https://www.intel.com/content/www/us/en/programmable/products/boards_and_kits/dev-kits/altera/kit-s10-mx.html

    The 1SM21BHU2F53E2VGS1 is the engineering sample device, and the board is currently not available for testing.

    When you generate the example design by using the PCIe IP AVMM GUI, did you select "Stratix 10 MX H-Tile ES1 FPGA Development Kit"? Did you able to try it by using a different server/host?

    Besides, you may also try to use "Avalon ST Intel Stratix 10 hard IP for PCI express" or "Avalon-MM Intel Stratix 10 Hard IP for PCIe Express" from IP catalog to generate the example design and see if it can link up as Gen3X8 as expected and detect by the host. This can help to rule out if there is a design example dependency.

    Regards -SK

    Regards -SK

  • SengKok_L_Intel's avatar
    SengKok_L_Intel
    Icon for Regular Contributor rankRegular Contributor

    Just to update that, we have also tested 1SM21BHU2F53E1VG with the Avalon-MM design that generated from the PCIe GUI. It is working as well. Since there is no recent activities, I will place this case to close-pending for now. If you have further questions, please do not hesitate to get back to us within the next 20-day close-pending period

    Regards -SK

  • Yanhan's avatar
    Yanhan
    Icon for New Contributor rankNew Contributor

    Hi, I came up with the same problem. Have you solved this? My device is 1SG280LN2F43E2VG.