ContributionsMost RecentMost LikesSolutionsRe: Intel FPGA AI Sutie Inference Engine Hello @JohnT_Intel , I mean the original example you suggested is CPU/GPU indeded. The real problem is how inference are manged. The examples collect multiple input images into a batch and request inference for the entire batch. I need to request an inference every time a new data is available. That's when the DLA instatiation problem arises. Re: Intel FPGA AI Sutie Inference Engine Hello @JohnT_Intel , I used HETERO FPGA plugin Re: Intel FPGA AI Sutie Inference Engine Hello @JohnT_Intel , Both examples work, but they are intended for CPU/GPU. In addition, they collect multiple input images into a batch and request inference for the entire batch just like the benchmark example. The issue is related to FPGA DLA instantiation. I need to request an inference on every input event. For some reason, this creates a new DLA instance each time instead of reusing the existing one. This leads to an error once the number of inferences reaches five. Do you have any suggestions to address this? Re: Intel FPGA AI Sutie Inference Engine Hello @JohnT_Intel, Same behaviour. I changed to create the blobs before the loop and only filling them in the loop: // Create blobs only once before the loop using Blob_t = std::vector<std::map<std::string, Blob::Ptr>>; std::vector<std::pair<Blob_t, Blob_t>> ioBlobs = vectorMapWithIndex<std::pair<Blob_t, Blob_t>>( exeNetworks, [&](ExecutableNetwork* const& exeNetwork, uint32_t index) mutable { Blob_t inputBlobs; Blob_t outputBlobs; ConstInputsDataMap inputInfo = exeNetwork->GetInputsInfo(); ConstOutputsDataMap outputInfo = exeNetwork->GetOutputsInfo(); for (uint32_t batch = 0; batch < num_batches; batch++) { std::map<std::string, Blob::Ptr> outputBlobsMap; for (auto& item : outputInfo) { auto& precision = item.second->getTensorDesc().getPrecision(); if (precision != Precision::FP32) { THROW_IE_EXCEPTION << "Output blob creation only supports FP32 precision. Instead got: " + precision; } auto outputBlob = make_shared_blob<PrecisionTrait<Precision::FP32>::value_type>(item.second->getTensorDesc()); outputBlob->allocate(); outputBlobsMap[item.first] = (outputBlob); } std::map<std::string, Blob::Ptr> inputBlobsMap; for (auto& item : inputInfo) { Blob::Ptr inputBlob = nullptr; auto& precision = item.second->getTensorDesc().getPrecision(); if (precision == Precision::FP32) { inputBlob = make_shared_blob<PrecisionTrait<Precision::FP32>::value_type>(item.second->getTensorDesc()); } else if (precision == Precision::U8) { inputBlob = make_shared_blob<PrecisionTrait<Precision::U8>::value_type>(item.second->getTensorDesc()); } else { THROW_IE_EXCEPTION << "Input blob creation only supports FP32 and U8 precision. Instead got: " + precision; } inputBlob->allocate(); inputBlobsMap[item.first] = (inputBlob); } inputBlobs.push_back(inputBlobsMap); outputBlobs.push_back(outputBlobsMap); } return std::make_pair(inputBlobs, outputBlobs); } ); std::cout << "Blobs initialized once before the loop.\n"; while (1) { ... // Fill blobs with new input values (DO NOT re-create them) for (size_t i = 0; i < exeNetworks.size(); i++) { slog::info << "Filling input blobs for network ( " << topology_names[i] << " )" << slog::endl; fillBlobs(inputs, ioBlobs[i].first); // Only fill the existing blobs } ... } Error: dlia_infer_request.cpp:53 Number of inference requests exceed the maximum number of inference requests supported per instance Re: Intel FPGA AI Sutie Inference Engine Hello @JohnT_Intel , As I said, it also included in dla_bechmark as well as the application I shared with you. It doesn't work. Find below the code extracted: for (size_t iireq = 0; iireq < nireq; iireq++) { auto inferRequest = inferRequestsQueues.at(net_id)->getIdleRequest(); if (!inferRequest) { THROW_IE_EXCEPTION << "No idle Infer Requests!"; } if(niter != 0LL){ std::cout << "#Debug: 10. Set output blob.\n"; for (auto & item : outputInfos.at(net_id)) { std::string currOutputName = item.first; auto currOutputBlob = ioBlobs.at(net_id).second[iterations.at(net_id)][currOutputName]; inferRequest->SetBlob(currOutputName, currOutputBlob); } std::cout << "#Debug: 10. Set input blob.\n"; for (auto & item: inputInfos.at(net_id)){ std::string currInputName = item.first; auto currInputBlob = ioBlobs.at(net_id).first[iterations.at(net_id)][currInputName]; inferRequest->SetBlob(currInputName, currInputBlob); } } // Execute one request/batch if (FLAGS_api == "sync") { inferRequest->infer(); } else { // As the inference request is currently idle, the wait() adds no additional overhead (and should return immediately). // The primary reason for calling the method is exception checking/re-throwing. // Callback, that governs the actual execution can handle errors as well, // but as it uses just error codes it has no details like ‘what()’ method of `std::exception` // So, rechecking for any exceptions here. inferRequest->wait(); inferRequest->startAsync(); } iterations.at(net_id) ++; if (net_id == exeNetworks.size() - 1) { execTime = std::chrono::duration_cast<ns>(Time::now() - startTime).count(); if (niter > 0) { progressBar.addProgress(1); } else { // calculate how many progress intervals are covered by current iteration. // depends on the current iteration time and time of each progress interval. // Previously covered progress intervals must be skipped. auto progressIntervalTime = duration_nanoseconds / progressBarTotalCount; size_t newProgress = execTime / progressIntervalTime - progressCnt; progressBar.addProgress(newProgress); progressCnt += newProgress; } } } Re: Intel FPGA AI Sutie Inference Engine Hello @JohnT_Intel , The same. It has a C++ example but no "wait_all" o similar funcion is used on it. Only in the Python example. it uses: for (ov::InferRequest& ireq : ireqs) { ireq.wait(); } Similar to the code I shared with you. Re: Intel FPGA AI Sutie Inference Engine Hello @JohnT_Intel, dla_benchmark is implemented in C++. The API documentation you shared in the previous comment is for Python. The example that uses wait_all is implemented in Python. There is also an example in C++, but it doesn't use wait_all, waitAll, or any similar function. In addition, the OpenVINO documentation is available, but the required OpenVINO version for the latest FPGA AI (2024.3) is 2023.3. Re: Intel FPGA AI Sutie Inference Engine Hello @JohnT_Intel , The following statement is present in the code I shared with you: std::cout << "#Debug: 10. waitAll.\n"; // wait the latest inference executions for (auto& inferRequestsQueue : inferRequestsQueues) inferRequestsQueue->waitAll(); Is this what you are referring to? It doesn't work. Maybe it is not used correctly. Do you have a pseudocode example? Re: Intel FPGA AI Sutie Inference Engine Hello @JohnT_Intel , Is there any news about this topic? I'm using the S2M design in case it is helpful to find an alternative solution based on streaming app. Re: Intel FPGA AI Sutie Inference Engine Hello @JohnT_Intel, Sorry. As you suggested reusing the inference request instead of creating a new one for each inference, I thought the solution was trivial and that the problem was in my implementation or concept. I look forward to a solution. I believe the concept of using the DLA is correct in a real application: deploy the accelerator and configure it with the graph, then keep it configured and continuously feed it with new data for inference. Isn't that right? Of course, new inferences must wait for the previous one to finish. Is this correct, or have I misunderstood something about the working principle of the accelerator?