Forum Discussion

PSath2's avatar
PSath2
Icon for New Contributor rankNew Contributor
6 years ago

DevCloud: OpenCL kernels build after update but host runtime fails on fpga nodes with auto-discovery error

After the software update to the FPGA compile nodes in https://software.intel.com/en-us/forums/intel-oneapi-base-toolkit/topic/843060

I seem to once again be able to compile kernels, thanks!

However, the OpenCL runtime fails to access the pac_a10 device in the nodes with the "fpga" or "arria10" property.

UPDATE: It appears this may be specific to s001-n084, as I am able to run clinfo successfully on s001-n088 and s001-n086. There is no distinguishing property in pbsnodes to separate out the broken nodes, so a poor workaround may be to manually pick a free node and queue there IFF it functions properly.

I have tried both with my own codes and with a simple clinfo and the result is the same autodiscovery error. clinfo output below

@s001-n084:~$ clinfo
Number of platforms                               3
  Platform Name                                   Intel(R) FPGA Emulation Platform for OpenCL(TM)
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 19.2
  Platform Profile                                EMBEDDED_PROFILE
  Platform Extensions                             cl_khr_icd cl_khr_byte_addressable_store cl_intel_fpga_host_pipe cles_khr_int64 cl_khr_il_program
  Platform Extensions function suffix             IntelFPGA
 
  Platform Name                                   Intel(R) FPGA SDK for OpenCL(TM)
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 19.3api
  Platform Profile                                EMBEDDED_PROFILE
  Platform Extensions                             cl_khr_byte_addressable_store cles_khr_int64 cl_khr_icd
  Platform Extensions function suffix             IntelFPGA
 
  Platform Name                                   Intel(R) OpenCL
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 2.1 LINUX
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_khr_il_program cl_intel_unified_shared_memory cl_intel_exec_by_local_thread cl_intel_vec_len_hint cl_intel_device_partition_by_names cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             INTEL
 
  Platform Name                                   Intel(R) FPGA Emulation Platform for OpenCL(TM)
Number of devices                                 1
  Device Name                                     Intel(R) FPGA Emulation Device
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x1172
  Device Version                                  OpenCL 1.0
  Driver Version                                  2019.8.10.0
  Device OpenCL C Version                         OpenCL C 1.0
  Device Type                                     Accelerator
  Device Profile                                  EMBEDDED_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Max compute units                               24
  Max clock frequency                             3400MHz
  Max work item dimensions                        3
  Max work item sizes                             67108864x67108864x67108864
  Max work group size                             67108864
  Preferred work group size multiple              128
  Preferred / native vector sizes
    char                                                 1 / 32
    short                                                1 / 16
    int                                                  1 / 8
    long                                                 1 / 4
    half                                                 0 / 0        (n/a)
    float                                                1 / 8
    double                                               1 / 4        (n/a)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    64, Little-Endian
  Global memory size                              202518421504 (188.6GiB)
  Error Correction support                        No
  Max memory allocation                           50629605376 (47.15GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        262144 (256KiB)
  Global Memory cache line size                   64 bytes
  Image support                                   No
  Local memory type                               Global
  Local memory size                               262144 (256KiB)
  Max number of constant args                     480
  Max constant buffer size                        131072 (128KiB)
  Max size of kernel argument                     3840 (3.75KiB)
  Queue properties
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Profiling timer resolution                      1ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    IL version                                    SPIR-V_1.0
  Device Extensions                               cl_khr_icd cl_khr_byte_addressable_store cl_intel_fpga_host_pipe cles_khr_int64 cl_khr_il_program
 
  Platform Name                                   Intel(R) FPGA SDK for OpenCL(TM)
Number of devices                                 1
FAILED to read auto-discovery string at byte 18446744073709551615. Full auto-discovery string value is
 
acl_hal_mmd.cpp:1426:assert failure: Failed to initialize kernel interfaceclinfo: acl_hal_mmd.cpp:1426: int l_try_device(unsigned int, const char*, acl_system_def_t*, acl_mmd_dispatch_t*): Assertion `0' failed.
Aborted

aocl diagnose thinks the BSP is installed correctly (attached as follow-up comment due to post length)

And my environment is essentially default (other than a bash function override of python-->python2) (attached as follow-up comment due to post length)

12 Replies

  • PSath2's avatar
    PSath2
    Icon for New Contributor rankNew Contributor

    aocl diagnose output

    @s001-n084:~$ aocl diagnose
    --------------------------------------------------------------------
    ICD System Diagnostics
    --------------------------------------------------------------------
     
    Using the following location for ICD installation:
            /etc/OpenCL/vendors
     
    Found 4 icd entry at that location:
            /etc/OpenCL/vendors/Altera.icd
            /etc/OpenCL/vendors/intel-cpu.icd
            /etc/OpenCL/vendors/Intel_FPGA_SSG_Emulator.icd
            /etc/OpenCL/vendors/intel-neo.icd
     
    the following OpenCL libraries are referenced in the icd files:
            libalteracl.so
            libintelocl.so
            libintelocl_emu.so
            libigdrcl.so
     
    checking LD_LIBRARY_PATH for registered libraries:
            libalteracl.so was registered on the system at /opt/intel/inteloneapi/compiler/2021.1-beta03/linux/lib/oclfpga/host/linux64/lib
            libintelocl.so was registered on the system at /opt/intel/inteloneapi/compiler/latest/linux/lib/x64
            libintelocl_emu.so was registered on the system at /opt/intel/inteloneapi/compiler/2021.1-beta03/linux/lib/oclfpga/host/linux64/lib
            libigdrcl.so was registered on the system at /opt/intel/inteloneapi/compiler/latest/linux/lib/oclgpu
     
    Using the following location for fcd installations:
            /opt/Intel/OpenCLFPGA/oneAPI/Boards
     
    Found 1 fcd entry at that location:
            /opt/Intel/OpenCLFPGA/oneAPI/Boards/dcp_bsp.fcd
     
    the following OpenCL libraries are referenced in the fcd files:
            /opt/intel/inteloneapi/compiler/2021.1-beta03/linux/lib/oclfpga/board/intel_a10gx_pac/linux64/lib/libintel_opae_mmd.so
     
    checking LD_LIBRARY_PATH for registered libraries:
            /opt/intel/inteloneapi/compiler/2021.1-beta03/linux/lib/oclfpga/board/intel_a10gx_pac/linux64/lib/libintel_opae_mmd.so was registered on the system.
     
    Number of Platforms = 3
            1. Intel(R) FPGA Emulation Platform for OpenCL(TM)              | Intel(R) Corporation           | OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 19.2
            2. Intel(R) FPGA SDK for OpenCL(TM)                             | Intel(R) Corporation           | OpenCL 1.0 Intel(R) FPGA SDK for OpenCL(TM), Version 19.3api
            3. Intel(R) OpenCL                                              | Intel(R) Corporation           | OpenCL 2.1 LINUX
    --------------------------------------------------------------------
    ICD diagnostics PASSED
    --------------------------------------------------------------------
    --------------------------------------------------------------------
    BSP Diagnostics
    --------------------------------------------------------------------
    --------------------------------------------------------------------
    Device Name:
    acl0
     
    BSP Install Location:
    /opt/intel/inteloneapi/compiler/2021.1-beta03/linux/lib/oclfpga/board/intel_a10gx_pac
     
    Vendor: Intel Corp
     
    Physical Dev Name   Status            Information
     
    pac_ee00000         Passed            Intel PAC Platform (pac_ee00000)
                                          PCIe 94:00.0
                                          FPGA temperature = 44 degrees C.
     
    DIAGNOSTIC_PASSED
    --------------------------------------------------------------------
     
    Call "aocl diagnose <device-names>" to run diagnose for specified devices
    Call "aocl diagnose all" to run diagnose for all devices
  • Lawrence_L_Intel's avatar
    Lawrence_L_Intel
    Icon for Occasional Contributor rankOccasional Contributor

    Hi Paul

    Letting you know this on our list of things to look at. Note that for the last couple of days we had a problem with the license server. So if there is a step in your flow that is a Quartus FPGA compile under the hood that could have failed causing down stream problems. Can you verify you still are having issues?

    Thanks

    Larry

    • PSath2's avatar
      PSath2
      Icon for New Contributor rankNew Contributor

      We can compile OpenCL kernels fine on the fpga_compile nodes thanks to that update. We can also compile and run host codes on several of the "fpga" nodes. (So far 86 and 88 are known good, I haven't run into a situation where i wasn't able to get on one of the two to look at the others yet.)

      Upon further testing, the issue noted above appears specific to s001-n084 (and when testing it this morning we are also not able to automatically find the OpenCL headers like we can on nodes 86 and 88). Perhaps it can be offlined until fixed so we can continue to queue for any node with the "fpga" property without landing on 84, rather than having to manually-pick the known-good nodes?

  • Lawrence_L_Intel's avatar
    Lawrence_L_Intel
    Icon for Occasional Contributor rankOccasional Contributor

    Hi Paul

    Can you try this setup script: /data/intel_fpga/devcloudLoginToolSetup.sh

    Then do tools_setup

    and select the devstack for Arria 10 or Stratix 10.

    We havent been testing the oneapi version. Note kernel downloads are only on n137,n138 and n139 or n189.

    Let me know if this works.

    Thanks

    Larry

  • PSath2's avatar
    PSath2
    Icon for New Contributor rankNew Contributor

    HI Larry,

    Apologies for the delayed response, we had another critical deadline that drew my full attention the last week.

    As far as the original forum topic goes, we were able to compile successfully on the main queue's fpga_compile nodes, and then able to run those aocx implementations on nodes 88 and 86 without any issues (using standard OpenCL C and C++ host codes).

    As for the beta queue, I've got our .l implementations in the compile queue for s001-n137 using option #5, Arria 10 stack. We get the below error pretty early on, but it seems to be proceeding nonetheless, will keep you updated.

    ....

    aoc: Selected default target board pac_a10

    Inconsistency detected by ld.so: dl-close.c: 811: _dl_close: Assertion `map->l_init_called' failed!

    aoc: Running OpenCL parser....

    ...

    As an aside, I see with aocl diagnose that the 0th and 2nd PAC A10 devices show status "Passed" ,while the 1st (ac11) shows "Uninitialized". IFF we can get compiled and running on at least two cards in the beta queue, one of our test codes is currently having to swap two cl_programs back and forth due to BRAM constrains on the single-device nodes in the default queue and it would be an interesting data point to run it on two devices and only move the data back and forth rather than reconfiguring the Arrias :)

    • PSath2's avatar
      PSath2
      Icon for New Contributor rankNew Contributor

      Update:

      All 3 kernel files grind for some time before eventually failing in a10_partial_reconfig/flow.tcl with the below error. (These builds were all performed on s001-n137)

      Checking if memory usage is larger than 100%
      remove outer_zero_and_others.1.bc
      remove area_src.json
      remove loops.json
      remove summary.json
      remove lmv.json
      remove mav.json
      remove info.json
      remove warnings.json
      remove area.html
      remove area.json
      remove outer_zero_and_others.bc
      /glob/development-tools/versions/fpgasupportstack/a10/1.2/inteldevstack/intelFPGA_pro/hld/linux64/bin/system_integrator   --bsp-flow green_top /glob/development-tools/versions/fpgasupportstack/a10/1.2/inteldevstack/a10_gx_pac_ias_1_2_pv/opencl/opencl_bsp/hardware/pac_a10/board_spec.xml "outer_zero_and_others.bc.xml" none kernel_system.tcl
      #aoc: First stage compilation completed successfully.
      Compiling for FPGA. This process may take a long time, please be patient.
      qsys-script --quartus-project=dcp --script=kernel_system.tcl -Xmx512M -XX:+UseSerialGC
      echo
      bash build/run.sh
      Error (213009): File name "output_files/afu_fit.green_region.pmsf" does not exist or can't be read
      Error: Quartus Prime Convert_programming_file was unsuccessful. 1 error, 0 warnings
      Error (23031): Evaluation of Tcl script a10_partial_reconfig/flow.tcl unsuccessful
      Error: Quartus Prime Shell was unsuccessful. 7 errors, 3092 warnings
  • MEIYAN_L_Intel's avatar
    MEIYAN_L_Intel
    Icon for Frequent Contributor rankFrequent Contributor

    Hi,

    From the error:

    Error (213009): File name "output_files/afu_fit.green_region.pmsf" does not exist or can't be read.

    It is same error as https://forums.intel.com/s/question/0D50P00004ZMIykSAH/error-213009-file-name-outputfilesafufitgreenregionpmsf-does-not-exist-or-cant-be-read which is license issue for Intel Acceleration Stack v1.2.

    For this error happened due to the expired license for Ethernet IP.

    I will check with developer for this issue in Devcloud.

    Thanks

    • agond2's avatar
      agond2
      Icon for New Contributor rankNew Contributor

      Hi Larry,

      Confirming that I was able to compile and run OpenCL kernels on nodes s001-n137,138 and 139. Thanks a lot for the patch.

      On the node s001-189, which has stratix10 FPGA, I used tools_setup command and selected option 6 for Stratix 10 development stack, I get the error "Error: Compiler Error, not able to generate hardware"

      -Atharva

      Quartus_sh_compile.log on node s001-189:

      This is the run.sh script.
      ERROR: packager tool failed to run.  Check installation.  Aborting compilation!