benchmark and classification_sample apps hang on starting inference when running with -d HETERO:FPGA,CPU.

Question

PAC installed in Artesyn MC1600 chassis with Intel(R) Xeon(R) CPU D-1567 @ 2.10GHz running CentOS 7.5.fpgainfo fme:Board Management Controller, microcontroller FW version 26889
Last Power Down Cause: POK_CORE
Last Reset Cause: None
//****** FME ******//
Object Id                     : 0xEF00000
PCIe s:b:d:f                  : 0000:06:00:0
Device Id                     : 0x09C4
Socket Id                     : 0x00
Ports Num                     : 01
Bitstream Id                  : 0x123000200000185
Bitstream Version             : 0x30201
Pr Interface Id               : 69528db6-eb31-577a-8c36-68f9faa081f6Prior to running the inference, this bitsream was programmed: aocl program acl0 /opt/intel/openvino/bitstreams/a10_dcp_bitstreams/2019R1_RC_FP11_ResNet_SqueezeNet_VGG.aocxclassification_sample and benchmark apps run without issue with target device set to CPU. Both applications hang when attempting run on the FPGA (with -d HETERO:FPGA,CPU). Inference on the FPGA usually complete successfully with a single iteration (-ni 1) but consistently hang with higher number of iterations.# ./classification_sample -d HETERO:FPGA,CPU -ni 10 -i /opt/intel/openvino/deployment_tools/demo/car.png -m /root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.xml
[ INFO ] InferenceEngine:
        API version ............ 1.6
        Build .................. custom_releases/2019/R1.1_28dfbfdd28954c4dfd2f94403dd8dfc1f411038b
[ INFO ] Parsing input parameters
[ INFO ] Files were added: 1
[ INFO ]     /opt/intel/openvino/deployment_tools/demo/car.png
[ INFO ] Loading plugin
 
        API version ............ 1.6
        Build .................. heteroPlugin
        Description ....... heteroPlugin
[ INFO ] Loading network files:
        /root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.xml
        /root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.bin
[ INFO ] Preparing input blobs
[ WARNING ] Image is resized from (787, 259) to (227, 227)
[ INFO ] Batch size is 1
[ INFO ] Preparing output blobs
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference (10 iterations)# ./benchmark_app -d HETERO:FPGA,CPU -i /opt/intel/openvino/deployment_tools/demo/car.png -m /root/openvino_models/ir/FP32/classification/squeezenet/1.1/caffe/squeezenet1.1.xml
[ INFO ] InferenceEngine:
        API version ............ 1.6
        Build .................. custom_releases/2019/R1.1_28dfbfdd28954c4dfd2f94403dd8dfc1f411038b
 
[Step 1/8] Parsing and validation of input args
[ INFO ] Parsing input parameters
[ INFO ] Files were added: 1
[ INFO ]     /opt/intel/openvino/deployment_tools/demo/car.png
Progress: [....................] 100.00% done
 
[Step 2/8] Loading plugin
[ INFO ]
        API version ............ 1.6
        Build .................. heteroPlugin
        Description ....... heteroPlugin
Progress: [....................] 100.00% done
 
[Step 3/8] Read IR network
[ INFO ] Loading network files
[ INFO ] Network batch size: 1, precision: FP32
Progress: [....................] 100.00% done
 
[Step 4/8] Configure input &amp; output of the model
[ INFO ] Preparing output blobs
Progress: [....................] 100.00% done
 
[Step 5/8] Loading model to the plugin
Progress: [....................] 100.00% done
 
[Step 6/8] Create infer requests and fill input blobs with images
[ INFO ] Infer Request 0 created
[ INFO ] Network Input dimensions (NCHW): 1 3 227 227
[ INFO ] Prepare image /opt/intel/openvino/deployment_tools/demo/car.png
[ WARNING ] Image is resized from (787, 259) to (227, 227)
[ INFO ] Infer Request 1 created
[ INFO ] Network Input dimensions (NCHW): 1 3 227 227
[ INFO ] Prepare image /opt/intel/openvino/deployment_tools/demo/car.png
[ WARNING ] Image is resized from (787, 259) to (227, 227)
Progress: [....................] 100.00% done
 
[Step 7/8]
Start inference asynchronously (120000.00 ms duration, 2 inference requests in parallel)
Progress: [                    ] 0.00% done

jonway_altera · Answer

Hi @mkont1​ Could you elaborate what "hangs" here means? Can you recover by Ctrl+C or you need to reboot?As sanity check,Does cold reset (power cycle) the server resolve the issue?Upon every reboot/ new terminal:Make sure that you have initialized the card.Make sure that you have set the hugepages. Allocate 20, 2 MB hugepages per card.Did the PAC pass the fpgabist? You may refer to below link (keyword "Running FPGA Diagnostics") https://www.intel.com/content/www/us/en/programmable/documentation/iyu1522005567196.htmlDid the PAC pass the aocl diagnose acl0? You may refer to: https://www.intel.com/content/www/us/en/programmable/documentation/fvf1521490619217.html#zru1523293789016Could you run below? I want to check you have correct OPAE version.rpm -qa | grep opaeDoes this fail with 2019R1_RC_FP11_ResNet_SqueezeNet_VGG only or does it fail with other AOCX as well?Could you try changing to use other aocx with lower FP?In summary, test as I suggest above first:reboot --&gt; Initialize --&gt; set hugepages --&gt; fpgabist --&gt; aocl diagnose acl0--&gt; change other aocx --&gt; change to lower FP.If failure persist:Please provide info of OS/kernel version and all the results you see from the above test.cat /etc/*eleaseuname -rThanks

jonway_altera · Answer

Hi @mkont1​ Would you perform a quick test:The demo cannot run the default batch size when running with FPGA. Need to make the changes on the batch size to more than 1. (eg. -b 10).

mkont1 · Answer

Can recover with Ctrl+C.Issue persists after power cycle.Hugepages set with:sudo sh -c "echo 20 &gt; /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages"Output of fpgabist:# sudo fpgabist $OPAE_PLATFORM_ROOT/hw/samples/nlb_mode_3/bin/nlb_mode_3.gbs
==========================================================
 
Beginning FPGA Built-In Self-Test
 
==========================================================
Device: bus = 6, device = , func =
Board Management Controller, microcontroller FW version 26889
Last Power Down Cause: POK_CORE
Last Reset Cause: External reset
Power-on-reset
//****** FME ******//
Object Id                     : 0xF000000
PCIe s:b:d:f                  : 0000:06:00:0
Device Id                     : 0x09C4
Socket Id                     : 0x00
Ports Num                     : 01
Bitstream Id                  : 0x123000200000185
Bitstream Version             : 0x30201
Pr Interface Id               : 69528db6-eb31-577a-8c36-68f9faa081f6
Board Management Controller, microcontroller FW version 26889
Last Power Down Cause: POK_CORE
Last Reset Cause: None
//****** PORT ******//
Object Id                     : 0xEF00000
PCIe s:b:d:f                  : 0000:06:00:0
Device Id                     : 0x09C4
Socket Id                     : 0x00
Ports Num                     : 01
Bitstream Id                  : 0x123000200000185
Bitstream Version             : 0x30201
Pr Interface Id               : 69528db6-eb31-577a-8c36-68f9faa081f6
Accelerator Id                : 18b79ffa-2ee5-4aa0-96ef-4230dafacb5f
Board Management Controller, microcontroller FW version 26889
Last Power Down Cause: POK_CORE
Last Reset Cause: None
//****** TEMP ******//
Object Id                     : 0xF000000
PCIe s:b:d:f                  : 0000:06:00:0
Device Id                     : 0x09C4
Socket Id                     : 0x00
Ports Num                     : 01
Bitstream Id                  : 0x123000200000185
Bitstream Version             : 0x30201
Pr Interface Id               : 69528db6-eb31-577a-8c36-68f9faa081f6
(11) FPGA Core TEMP           : 58.00 °C
(12) Board TEMP               : 47.00 °C
(14) QSFP TEMP                : No reading (reading state unavailable)
(15) Core Supply Temp         : 65.28 °C
Board Management Controller, microcontroller FW version 26889
Last Power Down Cause: POK_CORE
Last Reset Cause: None
//****** POWER ******//
Object Id                     : 0xF000000
PCIe s:b:d:f                  : 0000:06:00:0
Device Id                     : 0x09C4
Socket Id                     : 0x00
Ports Num                     : 01
Bitstream Id                  : 0x123000200000185
Bitstream Version             : 0x30201
Pr Interface Id               : 69528db6-eb31-577a-8c36-68f9faa081f6
( 0) Total Input Power        : 28.50 Watts
( 1) PCIe 12V Current         : 2.47 Amps
( 2) PCIe 12V Voltage         : 11.20 Volts
( 3) 1.2V Voltage             : 1.22 Volts
( 4) 1.2V Current             : 2.66 Amps
( 5) 1.8V Voltage             : 1.83 Volts
( 6) 1.8V Current             : 2.73 Amps
( 7) 3.3V Mgmt Voltage        : 3.34 Volts
( 8) 3.3V Current             : 0.54 Amps
( 9) FPGA Core Voltage        : 0.91 Volts
(10) FPGA Core Current        : 13.11 Amps
(13) QSFP P3V3                : No reading (reading state unavailable)
(16) Core Supply Temp Input   : 0.50 Volts
(17) VCCR Voltage             : 1.04 Volts
(18) VCCT Voltage             : 1.04 Volts
(19) VCCR Current             : 1.12 Amps
(20) VCCT Current             : 0.12 Amps
(21) VPP Voltage              : 2.53 Volts
(22) VTT Voltage              : 0.59 Volts
Board Management Controller, microcontroller FW version 26889
Last Power Down Cause: POK_CORE
Last Reset Cause: None
//****** PORT ERRORS ******//
Object Id                     : 0xEF00000
PCIe s:b:d:f                  : 0000:06:00:0
Device Id                     : 0x09C4
Socket Id                     : 0x00
Ports Num                     : 01
Bitstream Id                  : 0x123000200000185
Bitstream Version             : 0x30201
Pr Interface Id               : 69528db6-eb31-577a-8c36-68f9faa081f6
Accelerator Id                : 18b79ffa-2ee5-4aa0-96ef-4230dafacb5f
First Error                   : 0x0
First Malformed Req           : 0xFFFFFFFFFFFFFFFF
Errors                        : 0x0
Board Management Controller, microcontroller FW version 26889
Last Power Down Cause: POK_CORE
Last Reset Cause: None
//****** FME ERRORS ******//
Object Id                     : 0xF000000
PCIe s:b:d:f                  : 0000:06:00:0
Device Id                     : 0x09C4
Socket Id                     : 0x00
Ports Num                     : 01
Bitstream Id                  : 0x123000200000185
Bitstream Version             : 0x7FFF00030201
Pr Interface Id               : 69528db6-eb31-577a-8c36-68f9faa081f6
First Error                   : 0x0
Next Error                    : 0x0
Errors                        : 0x0
PCIe1 Errors                  : 0x0
Nonfatal Errors               : 0x0
Inject Error                  : 0x0
Catfatal Errors               : 0x0
PCIe0 Errors                  : 0x0
Running mode: nlb_3
Attempting Partial Reconfiguration:
Reading bitstream
Looking for slot
Found slot
Programming bitstream
Writing bitstream
Done
Running fpgadiag read test...

Cachelines Read_Count Write_Count Cache_Rd_Hit Cache_Wr_Hit Cache_Rd_Miss Cache_Wr_Miss   Eviction 'Clocks(@200 MHz)'   Rd_Bandwidth   Wr_Bandwidth
      1024  544035292           0            0            0             0             0          0       1000011426     6.964 GB/s     0.000 GB/s
 
VH0_Rd_Count VH0_Wr_Count VH1_Rd_Count VH1_Wr_Count VL0_Rd_Count VL0_Wr_Count
           0            0            0            0            0            0
 
Running fpgadiag write test...

Cachelines Read_Count Write_Count Cache_Rd_Hit Cache_Wr_Hit Cache_Rd_Miss Cache_Wr_Miss   Eviction 'Clocks(@200 MHz)'   Rd_Bandwidth   Wr_Bandwidth
      1024          0      762732            0            0             0             0          0       1000018957     0.000 GB/s     0.010 GB/s
 
VH0_Rd_Count VH0_Wr_Count VH1_Rd_Count VH1_Wr_Count VL0_Rd_Count VL0_Wr_Count
           0            0            0            0            0            0
 
Running fpgadiag trput test...

Cachelines Read_Count Write_Count Cache_Rd_Hit Cache_Wr_Hit Cache_Rd_Miss Cache_Wr_Miss   Eviction 'Clocks(@200 MHz)'   Rd_Bandwidth   Wr_Bandwidth
      1024  488225340   489909832            0            0             0             0          0       1000023141     6.249 GB/s     6.271 GB/s
 
VH0_Rd_Count VH0_Wr_Count VH1_Rd_Count VH1_Wr_Count VL0_Rd_Count VL0_Wr_Count
           0            0            0            0            0            0
 
Finished Executing NLB (FPGA DIAG)Tests

Built-in Self-Test Completed.aocl diagnose:# aocl diagnose
--------------------------------------------------------------------
Device Name:
acl0
 
BSP Install Location:
/root/intelrtestack/a10_gx_pac_ias_1_2_pv/opencl/opencl_bsp
 
Vendor: Intel Corp
 
Physical Dev Name   Status            Information
 
pac_a10_ef00000     Passed            PAC Arria 10 Platform (pac_a10_ef00000)
                                      PCIe 06:00.0
                                      FPGA temperature = 61 degrees C.
 
DIAGNOSTIC_PASSED
--------------------------------------------------------------------
 
Call "aocl diagnose &lt;device-names&gt;" to run diagnose for specified devices
Call "aocl diagnose all" to run diagnose for all devicesaocl diagnose acl0 gets stuck (recover with Ctrl+C) # aocl diagnose acl0
Using platform: Intel(R) FPGA SDK for OpenCL(TM)
Using Device with name: pac_a10 : PAC Arria 10 Platform (pac_a10_ef00000)
Using Device from vendor: Intel Corp
clGetDeviceInfo CL_DEVICE_GLOBAL_MEM_SIZE = 8589934592
clGetDeviceInfo CL_DEVICE_MAX_MEM_ALLOC_SIZE = 8589934592
Allocated 8589934592 bytes
Actual maximum buffer size = 8589934592 bytes
Writing 8192 MB to global memory ...
Allocated 1073741824 Bytes host buffer for large transfers
Write speed: 6917.17 MB/s [6912.93 -&gt; 6919.78]
Reading and verifying 8192 MB from global memory ...
Read speed: 6648.18 MB/s [6541.27 -&gt; 6688.25]
Successfully wrote and readback 8192 MB buffer
 
Poll(interrupt) timeoutrpm -qa | grep opae:# rpm -qa | grep opae
opae-libs-1.1.2-1.x86_64
opae-tools-1.1.2-1.x86_64
opae-intel-fpga-driver-1.1.2-1.x86_64
opae-tools-extra-1.1.2-1.x86_64
opae-devel-1.1.2-1.x86_64
opae-ase-1.1.2-1.x86_64OS and kernel versions:# cat /etc/*elease
 
Board               : PCIECARD
Release             : Distro OS
Version             : 2.0.2
Build-Date          : 24 January 2019
Kernel-Arch         : x86_64
Linux-Distribution  : CentOS.7.5.1804
CentOS Linux release 7.5.1804 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
 
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
 
Board               :   PCIECARD
Release             :   PCIe Manager
Version             :   2.0.2
Build-Date          :   11 December 2018
Kernel-Arch         :   x86_64
Kernel-Version      :   3.10.0-862.11.6.1.el7
Linux-Distribution  :   CentOS.7.5.1804
CentOS Linux release 7.5.1804 (Core)
CentOS Linux release 7.5.1804 (Core)
 
# uname -r
3.10.0-862.11.6.1.el7.x86_64Issue persists with 2019R1_RC_FP16_ResNet_SqueezeNet_VGG.aocx. I don't have an aocx with lower FP than 11.

mkont1 · Answer

Tried this with the benchmark_app. It didn't help.

jonway_altera · Answer

Hi @mkont1​ Would you try -b10 AND -niter 100?

Forum Discussion

benchmark and classification_sample apps hang on starting inference when running with -d HETERO:FPGA,CPU.

7 Replies

Recent Discussions

AI Suite - Spatial IP outputs wrong value

AI Suite - Is it possible to simulate the AI IP?

AI Suite - Streaming from HPS to DLA IP

Agilex 7 I-Series "aocl diagnose acl0" error following OFS

AI Suite - Custom model in the FPGA building process