Forum Discussion

Atul_Ghalame's avatar
Atul_Ghalame
Icon for New Contributor rankNew Contributor
5 years ago

Valgrind fails to run Intel FPGA OpenCL design examples on Arria-10

Hi all,

I'm trying to run Intel's FPGA OpenCL design examples on Arria-10 card. I've been using SDK environment 16.1 so far and it showed memory leak with valgrind tool. So we decided to upgrade our environment to 17.1. BSP and compiler has been updated I could run hello_word code on new version. However, I'm not able to run valgrind tool, it shows:

MMD ERROR: Unable to find an unused signal number
Querying platform for info:
==========================
CL_PLATFORM_NAME = Intel(R) FPGA SDK for OpenCL(TM)
CL_PLATFORM_VENDOR = Intel(R) Corporation
CL_PLATFORM_VERSION = OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 17.1

ERROR: CL_DEVICE_NOT_FOUND
Location: ../common/src/AOCLUtils/opencl.cpp:356
Query for number of devices failed

Is it linking error or something else? Kindly suggest possible issue/solution

Thanks,

5 Replies

  • AnilErinch_A_Intel's avatar
    AnilErinch_A_Intel
    Icon for Frequent Contributor rankFrequent Contributor

    Hi,


    Please make sure that you have set the environment variables correctly.

    Please compare the previous working values of environment variables and current one,

    for example variables like CL_CONTEXT_EMULATOR_DEVICE_ALTERA should be properly set.


    Thanks and Regards

    Anil


    • Atul_Ghalame's avatar
      Atul_Ghalame
      Icon for New Contributor rankNew Contributor

      Hi,

      Thanks for your suggestion. I'll to ensure env variables are passed to valgrind and how it worked for previous version.

      I'm running .aocx compiled on board and not the emulator mode, so could you please mention relevant variable I should check.

      What does 'MMD ERROR: Unable to find an unused signal number' indicate?

      strace command on valgrind shows 'aocl-pro-rte/host/linux64/lib/tls/x86_64/libc.so.6' being used. Any device specific library not set properly?

      Kind regards,

      Atul

    • Atul_Ghalame's avatar
      Atul_Ghalame
      Icon for New Contributor rankNew Contributor

      Hi,

      I tried to keep both env variables Altera & Intel but no change.

      strace tool shows libnalla_pcie_mmd.so is being opened multiple times unlike without valgrind run.

      Any change is expected below lines of board_env.xml or board_env.icd file?

      <mmdlib>%b/linux64/lib/libnalla_pcie_mmd.so</mmdlib>
      <linkflags>-L%b/linux64/lib</linkflags>
      <linklibs>-lnalla_pcie_mmd</linklibs>
      <utilbindir>%b/linux64/libexec</utilbindir>

      Please keep posted if you managed to run Valgrind test on FPGA (not emulator) for any SDK version beyond 16.1, best if you have tested 385A card.

      Thanks,

      Atul

  • AnilErinch_A_Intel's avatar
    AnilErinch_A_Intel
    Icon for Frequent Contributor rankFrequent Contributor

    Hi Atul,

    Since It would be a best practice to analyze the host code for possible memory leaks and resolve it. Since you have mentioned that already Valgrind was used to detect the same. If you can share the host code we can have a look at the same.

    Thanks and Regards

    Anil


    • Atul_Ghalame's avatar
      Atul_Ghalame
      Icon for New Contributor rankNew Contributor

      Hi Anil,

      We were trying with old card and compatible environment but later we tested on SDK RTE 20.2 and design examples on available on Intel site. Hello_world code compiled for emulator device and tested with valgrind tool. When I run host code as it is, it shows:

      ==2323== LEAK SUMMARY:
      ==2323== definitely lost: 68,056 bytes in 6 blocks
      ==2323== indirectly lost: 1,485 bytes in 36 blocks
      ==2323== possibly lost: 44,570 bytes in 196 blocks
      ==2323== still reachable: 270,918 bytes in 2,458 blocks

      Our OS is CenOS 7 and Valgrind 3.15, observed similar leak on another machine as well. I commented host code and I can see creating context (below line) is showing above memory leak.

      context = clCreateContext(NULL, 1, &device, &oclContextCallback, NULL, &status);

      Could you check hello_world code your end and let us know your observation,

      Thanks,

      Atul