Forum Discussion

RubenPadial's avatar
RubenPadial
Icon for Contributor rankContributor
10 months ago

Intel FPGA AI Sutie Inference Engine

Is there any official documentation on the DLA runtime or inference engine for managing the DLA from the ARM side? I need to develop a custom application for running inference, but so far, I’ve only found the dla_benchmark (main.cpp) and streaming_inference_app.cpp example files. There should be some documentation covering the SDK. The only documentation that i found related with is the Intel FPGA AI suite PCIe based design example https://www.intel.com/content/www/us/en/docs/programmable/768977/2024-3/fpga-runtime-plugin.html

From what I understand, the general inference workflow involves the following steps:

  1. Identify the hardware architecture
  2. Deploy the model
  3. Prepare the input data
  4. Send inference requests to the DLA
  5. Retrieve the output data

42 Replies

  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,


    Can you share me with your code or step so that I can try duplicating the issue from my side?


  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,


    You may make use of dla_benchmark apps and modfy from there. The new method should be as using "device_name.find("FPGA")"


    • RubenPadial's avatar
      RubenPadial
      Icon for Contributor rankContributor

      Hello @JohnT_Intel

      Taking dla_benchmark as an example, I get the following error:

      [ ERROR ]

      runtime/hps_packages/openvino/src/inference/src/ie_common.cpp:75
      runtime/plugin/src/dlia_infer_request.cpp:53 Number of inference requests exceed the maximum number of inference requests supported per instance 5

      I'm looping the inference request because I need to instantiate the DLA and continuously request inferences with new data. Each inference must be a single request, so I set nireq=1 and niter=1. Once an inference is finished, I request a new one with new input data.

      Therefore, I loop from step no. 9 to 11, obtaining the new input data before filling the blobs.

      Is this approach correct? I understand a real application needs to instantiate de DLA and keep filling input to compute the CNN oputput with new data.




  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi Ruben,


    Currently the only documentation is from the OpenVINO tools. When you are using HETERO:FPGA, CPU then the OpenVINO will try the AI in FPGA whenever it is possible and if it is not possible then the layer will be performed in CPU side. The OpenVINO will automatically communicate with the FPGA MMD driver


    Let me know if you have further queries on this or you need any help on this.


    • RubenPadial's avatar
      RubenPadial
      Icon for Contributor rankContributor

      Hello @JohnT_Intel ,

      But when I use "GetAvailableDevices()" method I only get CPU as available device. There should be something I missed.
      Form my point of view, there some points to be clarified from the Intel/Altera side to use OpenVINO tool in FPGA devices with FPGA AI Suite.

  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi Ruben,


    Currently we do not have any document publish. Let me check internally if we have any documentation to share out.


    • RubenPadial's avatar
      RubenPadial
      Icon for Contributor rankContributor

      Hello @JohnT_Intel ,

      I know both example applications are based on OpenVINO runtime but I cannot find anything about FPGA and HETERO plugin to make inferences in HETERO:FPGA,CPU mode. This is the documentation I found https://docs.openvino.ai/archives/index.html

      I will very helpful any official documentation from Intel side to make Intel FPGA AI suite really useful.