Is there any official documentation on the DLA runtime or inference engine for managing the DLA from the ARM side? I need to develop a custom application for running inference, but so far, I’ve only found the dla_benchmark (main.cpp) and streaming_inference_app.cpp example files. There should be some documentation covering the SDK. The only documentation that i found related with is the Intel FPGA AI suite PCIe based design example https://www.intel.com/content/www/us/en/docs/programmable/768977/2024-3/fpga-runtime-plugin.htmlFrom what I understand, the general inference workflow involves the following steps:Identify the hardware architectureDeploy the modelPrepare the input dataSend inference requests to the DLARetrieve the output data

Hi Ruben,Currently we do not have any document publish. Let me check internally if we have any documentation to share out.

Hello @JohnT_Intel ,I know both example applications are based on OpenVINO runtime but I cannot find anything about FPGA and HETERO plugin to make inferences in HETERO:FPGA,CPU mode. This is the documentation I found https://docs.openvino.ai/archives/index.htmlI will very helpful any official documentation from Intel side to make Intel FPGA AI suite really useful.

Hi Ruben,Currently the only documentation is from the OpenVINO tools. When you are using HETERO:FPGA, CPU then the OpenVINO will try the AI in FPGA whenever it is possible and if it is not possible then the layer will be performed in CPU side. The OpenVINO will automatically communicate with the FPGA MMD driver Let me know if you have further queries on this or you need any help on this.

Hello @JohnT_Intel ,But when I use "GetAvailableDevices()" method I only get CPU as available device. There should be something I missed.Form my point of view, there some points to be clarified from the Intel/Altera side to use OpenVINO tool in FPGA devices with FPGA AI Suite.

Hi,You may make use of dla_benchmark apps and modfy from there. The new method should be as using "device_name.find("FPGA")"

Intel FPGA AI Sutie Inference Engine

42 Replies

JohnT_Altera

Regular Contributor

9 months ago

Hi Ruben,

I think you might need to only provide new input of data and not changing the blob which will think that this is a new inference setting.

During the 1st run, you should have performed all the setting and during the second run onwards, you should just provide the input data.

RubenPadial

Contributor

8 months ago

Hello @JohnT_Intel,

Same behaviour.

I changed to create the blobs before the loop and only filling them in the loop:

        // Create blobs only once before the loop
        using Blob_t = std::vector<std::map<std::string, Blob::Ptr>>;
        std::vector<std::pair<Blob_t, Blob_t>> ioBlobs = vectorMapWithIndex<std::pair<Blob_t, Blob_t>>(
            exeNetworks, [&](ExecutableNetwork* const& exeNetwork, uint32_t index) mutable {
                Blob_t inputBlobs;
                Blob_t outputBlobs;
                ConstInputsDataMap inputInfo = exeNetwork->GetInputsInfo();
                ConstOutputsDataMap outputInfo = exeNetwork->GetOutputsInfo();
                
                for (uint32_t batch = 0; batch < num_batches; batch++) {
                    std::map<std::string, Blob::Ptr> outputBlobsMap;
                    for (auto& item : outputInfo) {
                        auto& precision = item.second->getTensorDesc().getPrecision();
                        if (precision != Precision::FP32) {
                            THROW_IE_EXCEPTION << "Output blob creation only supports FP32 precision. Instead got: " + precision;
                        }
                        auto outputBlob = make_shared_blob<PrecisionTrait<Precision::FP32>::value_type>(item.second->getTensorDesc());
                        outputBlob->allocate();
                        outputBlobsMap[item.first] = (outputBlob);
                    }

                    std::map<std::string, Blob::Ptr> inputBlobsMap;
                    for (auto& item : inputInfo) {
                        Blob::Ptr inputBlob = nullptr;
                        auto& precision = item.second->getTensorDesc().getPrecision();
                        if (precision == Precision::FP32) {
                            inputBlob = make_shared_blob<PrecisionTrait<Precision::FP32>::value_type>(item.second->getTensorDesc());
                        } else if (precision == Precision::U8) {
                            inputBlob = make_shared_blob<PrecisionTrait<Precision::U8>::value_type>(item.second->getTensorDesc());
                        } else {
                            THROW_IE_EXCEPTION << "Input blob creation only supports FP32 and U8 precision. Instead got: " + precision;
                        }
                        inputBlob->allocate();
                        inputBlobsMap[item.first] = (inputBlob);
                    }

                    inputBlobs.push_back(inputBlobsMap);
                    outputBlobs.push_back(outputBlobsMap);
                }
                
                return std::make_pair(inputBlobs, outputBlobs);
            }
        );

        std::cout << "Blobs initialized once before the loop.\n";

        while (1) {
        ...
          // Fill blobs with new input values (DO NOT re-create them)
          for (size_t i = 0; i < exeNetworks.size(); i++) {
                slog::info << "Filling input blobs for network ( " << topology_names[i] << " )" << slog::endl;
                fillBlobs(inputs, ioBlobs[i].first);  // Only fill the existing blobs
           }
       ...
        }

Error: dlia_infer_request.cpp:53 Number of inference requests exceed the maximum number of inference requests supported per instance

JohnT_Altera
Regular Contributor
8 months ago
Hi Ruben,

I think you might need to try out with OpenVINO example design or other runtime example design to see if it is working from your side (eg. classification_sample_async or object_detection_demo)?
- RubenPadial
  Contributor
  7 months ago
  Hello @JohnT_Intel ,
  Both examples work, but they are intended for CPU/GPU. In addition, they collect multiple input images into a batch and request inference for the entire batch just like the benchmark example. The issue is related to FPGA DLA instantiation. I need to request an inference on every input event. For some reason, this creates a new DLA instance each time instead of reusing the existing one. This leads to an error once the number of inferences reaches five. Do you have any suggestions to address this?
JohnT_Altera
Regular Contributor
7 months ago
Hi,
May I know how do you run it? Have you run it with FPGA plugin?
- RubenPadial
  Contributor
  7 months ago
  Hello @JohnT_Intel ,
  I used HETERO FPGA plugin
JohnT_Altera
Regular Contributor
7 months ago
Hi,
Do you face any error when running HETERO or you are observing that the code that is intended for CPU/GPU not working?
RubenPadial
Contributor
7 months ago
Hello @JohnT_Intel ,

I mean the original example you suggested is CPU/GPU indeded.
The real problem is how inference are manged. The examples collect multiple input images into a batch and request inference for the entire batch. I need to request an inference every time a new data is available. That's when the DLA instatiation problem arises.

Forum Discussion

Intel FPGA AI Sutie Inference Engine

42 Replies

Recent Discussions

This is a test post

Agilex 7 M series Open FPGA Stack support

Deprecation Notice for FPGA Support Package for oneAPI DPC++/C++. What is the alternative?

Agilex 7 I-Series "aocl diagnose acl0" error following OFS

OneAPI Support for Agilex 5 and 7 Development Kits