Forum Discussion

wgshnyhlw's avatar
wgshnyhlw
Icon for New Contributor rankNew Contributor
2 years ago

Can not find device after server carry two Arria 10 GX PAC.

Hi everyone,

I want to try to use two Acceleration card with Arria 10 GX FPGA, so that our system can deal with larger amounts of data in parallel. Everything works well before I plug in the second PAC on server.

After I plug in the second PAC, add an new PCI device on server vmware, and reinstall the development stack V 1.2.1. I found Linux kernel installation output is not full, 'lsmod | grep fpga' output like the picture shows below, without 'intel_fpga_fme', 'intel_fpga_pac_hssi' and so on. It is obvious that no device could be found by 'sudo fpgainfo fme'.

I make sure that two PAC can be find in PCI list, but it seems that two card have the same name 09c5, is it normal situation?

My question:

1. Why the two acceleration card have the same name in pci list?

2. I have been follow the TroubleShooting section 'F.5. Troubleshooting OPAE Installation on RHEL' steps to update Linux kernel and reinstall software, but still useless. Can two acceleration card be used at the same time? Does anyone can give me other advice to solve this problem?

Hope to get some useful support. Thanks.

17 Replies

  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,


    1. Both card had the same name as you are using same board connected to the server. If you look at the left enumerated number, it will be different as that is usually used by CPU.
    2. Both card can be used at the same time without any issue. Can you check if the driver is loaded into both the card? You can check it with "lspci -s 13:00.0 -v" and "lspci -s 1b:00.0"


    Thanks.


    • wgshnyhlw's avatar
      wgshnyhlw
      Icon for New Contributor rankNew Contributor

      Hi John,

      Yes I have some questions. I pulled out one of the two Arria 10 GX PAC to try to fix the driver. I tried to reinstall the development kit, but it seems still not installed successfully.

      After I reinstalled the development software, "lspci" logs shows the PAC device 09c5 could be found:

      However, "lsmod | grep fpga" logs still not full:

      Before I started using two PAC cards, the driver was perfectly usable with my code.

      Question:

      What's wrong the the driver now? What should I do to fix the driver at first? I want to check whether I miss some operations before I reinstalled the driver.

      If there are any useful documents could be provided to show me the correct steps to let multiple PAC cards running was really appreciated.

      Thanks. Best wishes.

    • wgshnyhlw's avatar
      wgshnyhlw
      Icon for New Contributor rankNew Contributor

      Hi John,

      I totally follow this documents to install the development driver. Before I insert the second PAC card on the server, everything works fine. I have been followed the Quick Start Guide to reinstall Opea software, and also follow the section "Troubleshooting OPAE Installation on RHEL" to try to fix the OPAE software. However, all operations are useless.

      Do you have any other advice? How can I find the cause of the problem?

      Thank you.

  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,


    Can you performed "lspci -vv" to see if any driver is attach to PAC card?


    • wgshnyhlw's avatar
      wgshnyhlw
      Icon for New Contributor rankNew Contributor

      Hi John,

      The "lspci -vv" command output is shown below:

      Can these information tell you what the problem is? Thanks for your support.

  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,


    Are you seeing same driver on both card? You are only providing information on 1 card only.


    What do you observed when performing ""fpgainfo fme"?


    • wgshnyhlw's avatar
      wgshnyhlw
      Icon for New Contributor rankNew Contributor

      Hey John,

      The second PAC card lspci log shows below:

      "fpgainfo fme" command print "No device found".

  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,


    I observed that when you grep the OPAE driver, you only have 2 driver installed. Below is the example that you should observed when installing the OPAE correctly.



    • wgshnyhlw's avatar
      wgshnyhlw
      Icon for New Contributor rankNew Contributor

      Hi,

      I know the opae driver not fully be installed. Beacause before I use two PAC card, the opae driver installed correctly, which shows same with your picture.

      So I want to know what happened with the driver after I use two PAC cards. It is strange because I think the driver should automatically detect when I plug in an new card. However the driver seems didn't look like what I thought.

      I have been tried to reinstall the opae driver, but still not helpful, do you have other suggestion?

  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,


    Can you remove 1 of the card and see if you are still able to detect the card witthout any issue? I have tried from my side and there is no issue on connecting another card into the system


    • wgshnyhlw's avatar
      wgshnyhlw
      Icon for New Contributor rankNew Contributor

      Hi John,

      I have tried to remove one of the card and reinstall the opae driver. But "lsmod | grep fpga" also performs weird with only two output.

      My reinstall step is:

      "sudo rm -r /inteldecstack"

      cd ~/a10_gx_pac_ias_1_2_1_pv_dev_installer

      ./setup.sh

      Is there a problem with my installation steps?

      Wish you have a good day. Thank you.

  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi.


    May I know where do you get the installation file? Are you installing based on the script only? Have you try to reeinstall the PAC installation as well?


  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,


    Have you try un-installed before installing again? Do you have another system to test it out?


    • wgshnyhlw's avatar
      wgshnyhlw
      Icon for New Contributor rankNew Contributor

      Hi John,

      What is the right operation you mentioned "un-install"? I un-install the acceleration stack by "sudo rm -r ../inteldevstack". If my operation was wrong, please give me the correct way.

      I only have one system to support acceleration card. Do you think I should reinstall the operating system?

      Thank you.