Hi Kian,
We did some experiments and below are some observation which might be useful to root cause the issue.
In the design example given AI design suit HPS DDR4 memory interfaced using following IO interface
// HPS Memory
input wire emif_ref_clk,
input wire hps_memory_oct_rzqin,
output wire [0:0] hps_memory_mem_ck,
output wire [0:0] hps_memory_mem_ck_n,
output wire [16:0] hps_memory_mem_a,
output wire [0:0] hps_memory_mem_act_n,
output wire [1:0] hps_memory_mem_ba,
output wire [0:0] hps_memory_mem_bg,
output wire [0:0] hps_memory_mem_cke,
output wire [0:0] hps_memory_mem_cs_n,
output wire [0:0] hps_memory_mem_odt,
output wire [0:0] hps_memory_mem_reset_n,
output wire [0:0] hps_memory_mem_par,
input wire [0:0] hps_memory_mem_alert_n,
inout wire [3:0] hps_memory_mem_dqs,
inout wire [3:0] hps_memory_mem_dqs_n,
inout wire [31:0] hps_memory_mem_dq,
inout wire [3:0] hps_memory_mem_dbi_n,
Above signal definitions are not complying to 1 GB DDR4 (256 Mb x 40 x single rank) memory with board.
We updated memory interface according to provided DDR memory 1 GB DDR4 (256 Mb x 40 x single rank), now we are able to boot the Arria 10 SOC board by referring Arria 10 SoC GSRD golden example.
However, we are still unable to run demo example. While running demo app, we are getting DLA timeout error as below (We tried M2M and S2M both and getting same observations).
root@arria10-a2524a6b645b:~/app# ./dla_benchmark -b=1 -cm $compiled_model -d=HETERO:FPGA,CPU -i $imgdir -niter=5 -plugins_xml_file ./plugins.xml -arch
_file $archfile -api=async -groundtruth_loc $imgdir/TF_ground_truth.txt -perf_est -nireq=4 -bgr
[Step 1/12] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] Found 1 compiled graph
[ INFO ] Using custom plugins xml file - ./plugins.xml
[ INFO ] Network is compiled
[ INFO ] Printing summary of arguments being used by dla_benchmark
[ INFO ] API (-api) ........................... async
[ INFO ] Device (-d) .......................... HETERO:FPGA,CPU
[ INFO ] Batch size (-b) ...................... 1
[ INFO ] Compiled model (-cm) ................. /home/root/resnet-50-tf/RN50_Performance_b1.bin
[ INFO ] Num iterations (-niter) .............. 5
[ INFO ] Input images directory (-i) .......... /home/root/resnet-50-tf/sample_images
[ INFO ] Num CPU threads (-nthreads) .......... Not specified
[ INFO ] Architecture file (-arch_file) ....... /home/root/resnet-50-tf/A10_Performance.arch
[ INFO ] Num inference requests (-nireq) ...... 4
[ INFO ] Plugins file (-plugins_xml_file) ..... ./plugins.xml
[ INFO ] Groundtruth file (-groundtruth_loc) .. /home/root/resnet-50-tf/sample_images/TF_ground_truth.txt
[ INFO ] Reverse input image channels (-bgr) .. True
[ INFO ] Reading /home/root/resnet-50-tf/sample_images for graph index 0
[ WARNING ] -nstreams default value is determined automatically for a device.
Although the automatic selection usually provides a reasonable performance,
but it still may be non-optimal for some cases, for more information look at README.
[Step 2/12] Loading Inference Engine
[ INFO ] OpenVINO: Build ................................. 2022.3.0-9052-9752fafe8eb-HEAD
[ INFO ]
[Step 3/12] Setting device configuration
[Step 4/12] Reading the Intermediate Representation network
[ INFO ] Skipping the step for compiled network
[Step 5/12] Resizing network to match image sizes and given batch
[ INFO ] Skipping the step for compiled network
[Step 6/12] Configuring input of the model
[ INFO ] Skipping the step for compiled network
[Step 7/12] Loading the model to the device
[ INFO ] Importing model from /home/root/resnet-50-tf/RN50_Performance_b1.bin to HETERO:FPGA,CPU as Graph_0
Runtime arch check is enabled. Check started...
Runtime arch check passed.
Runtime build version check is enabled. Check started...
Runtime build version check passed.
[ INFO ] Import network took 3493.0785 ms
[Step 8/12] Setting optimal runtime parameters
[ WARNING ] Number of iterations was aligned by request number from 5 to 8 using number of requests 4
[Step 9/12] Creating infer requests and filling input blobs with images
[ INFO ] Filling input blobs for network ( Graph_0 )
[ INFO ] Network input 'map/TensorArrayStack/TensorArrayGatherV3' precision U8, dimensions (NCHW): 1 3 224 224
[ WARNING ] Some image input files will be ignored: only 8 are required from 10
[Step 10/12] Measuring performance (Start inference asyncronously, 4 inference requests using 1 streams for CPU, limits: 8 iterations with each graph)
WaitForDla polling timeout with threadId_0
If inference on one batch is expected to take more than 30 seconds, then increase WAIT_FOR_DLA_TIMEOUT in dlia_plugin.cpp and recompile the runtime.
../src/inference/src/ie_common.cpp:75 FATAL ERROR: inference on FPGA did not complete, jobs finished 0, jobs waited 0
[ ERROR ] Infer failed
Also, FPGA DDR4 test shows error from the BTS tool (attached screenshot).