AI Suite - Custom model in the FPGA building process
Hello Altera Community. My question is: Where in the FPGA building process do I incorporate my costum neural network into the design? This is my current understanding of the FPGA building process: The IP block is generated with the dla create ip script, which takes arch file as input. The IP block is placed in platform designer, and then is connected to memory and signals. After compiling, the data is send to the design using runtime, (JTAG being the slowest) Where does the NN Model I made with PyTorch gets incorporated into all this?4Views0likes0CommentsAny date for the release of the Docker image alterafpga/fpgaaisuite-quartus-v2026.1.1?
The FPGA AI Suite Handbook v2026.1.1 refers to docker images alterafpga/fpgaaisuite-quartus-v2026.1.1 and alterafpga/fpgaaisuite-v2026.1.1 but docker desktop stops at alterafpga/fpgaaisuite-v2025.3. Could anyone give a best guess as to when it will become available? Best wishes, Jeremy35Views0likes3CommentsDownloading AI Suite deb file returns text file
Hello, I'm at the Altera Download Center. I have tried to download the *.deb version for the FPGA AI Suite Version 2026.1.1. After selecting the "Accept" option in the Software License Agreement modal box, the web server returns a text file for the .deb binary. I believe there is a bug with your web server configuration for this file that fools my browser into believing it's downloading a text file. It's difficult to debug the page myself since the Accept button seems to trigger a JS event. I'm using Firefox on Ubuntu Linux 24.04. Thank youSolved74Views0likes5CommentsIs Spatial IP ready for LLM / transformer inference?
I am using FPGA AI Suite 2026.1.1 (with the new spatial compiler). Most of the FPGA AI Suite handbook examples I see are classical CNN / vision flows (ResNet-style) on PCIe, hostless JTAG, and SoC. Is transformer / LLM inference (attention layers, variable sequence lengths, large KV-cache activations, etc.) something we can target today with dla_compiler + Spatial IP, or is Spatial still aimed primarily at CNN-like graphs, or is custom RTL expected? And if yes, are there any LLM examples, guides, recommended flows, or known limitations? Thanks,87Views0likes3CommentsAi Suite - What is the purpose of the create HPS Image script?
I'm trying to understand the output of the create hps image script. Executing the script produces a number of files. One the these is a wic file that can be written to a SD card, and is used to boot Linux on the SoC. However when inspecting the linux system it does not contain any files related to OpenVino or CoreDLA. I expected there to be tools that help run inference on the SoC FPGA. What is the purpose of this script? I'm aware of the Figure shown in the handbook but it only explains the flow of the script, not the output.40Views0likes1CommentCan Intel's AI Reference Kit LLM pipelines run on OpenVINO runtime inside FPGA AI Suite 26.1.1?
I run OpenVINO + FPGA AI Suite 26.1.1 in two setups: PCIe: OpenVINO on x86 Linux host → FPGA card SoC: OpenVINO on Arm Linux (HPS) → FPGA AI Suite IP over AXI Intel's AI Reference Kits include ready-made LLM inference pipelines built on OpenVINO. https://www.intel.com/content/www/us/en/developer/topic-technology/edge-5g/open-potential.html I want to take one of these pipelines and run it using the OpenVINO runtime that ships inside FPGA AI Suite, so the FPGA handles the inference instead of the CPU. Is the bundled OpenVINO runtime + FPGA plugin / spatial compiler in 2026.1.1 compatible with these Reference Kit LLM pipelines? If it does not work directly out-of-the-box, what modifications would be needed? Thanks,38Views0likes1CommentError faced while executing on Agilex FPGA board....
Hi team, I have been implementing and trying to port python model on Agilex 7- FPGA I series - board I have successfully installed Open vino (2024.6) and FPGA AI suit (2025.3) in my host pc Using Openvino I am able to create IR files and compiled graphs using FPGA AI suit. My next step is I copied files into SD card and i am trying to run on FPGA. I am using below command to check performance of my Python model in FPGA... ./dla_benchmark -cm Performance.bin -d HETERO:FPGA -i ./input_bins -bin_data -niter 1000 - nthreads 4 -pin YES -pc -pcsort sort -plugins=./model_opt_4_snr_20_50_ds_5_805.xml -report_folder ./reports_pc -stream_output Please check the screenshot attached for the error... Request you to suggest me for the solution116Views0likes4CommentsAI Suite System Throughput Issue
When using AI Suite, we are seeing a significant gap between IP throughput and achieved system throughput on Agilex 5. I am using the following: Hardware: Agilex™ 5 FPGA and SoC E-Series Modular Development Kit (ES silicon) Software: Quartus Prime Pro + AI Suite 25.3.1 SD Image: agx5_soc_s2m coredla-image-agilex5_mk_a5e065bb32aes1.wic Architecture and Bitstream: AGX5_Performance Using MobileNetV2 (Open Model Zoo 2024.6.0) compiled using AGX5_Performance architecture gives the following results through dla_benchmark IP throughput per instance: ~151 FPS Estimated throughput (200 MHz): ~178 FPS System throughput: nireq=1 → 41 FPS nireq=4 → 54 FPS Why is there such a big delta between IP Performance and System Throughput and how can we improve the system throughput? For more details please see the append log showing the commands that I run to do the benchmark Any pointers or help would be highly appreciated. Thanks ===================================================================== 1. Using mobilenet v2 from model zoo ===================================================================== Commands used to download and compile model: git clone https://github.com/openvinotoolkit/open_model_zoo.git cd open_model_zoo git checkout 2024.6.0 omz_downloader --list omz_downloader --name mobilenet-v2-pytorch --output_dir $COREDLA_WORK/demo/models/ omz_converter --name mobilenet-v2-pytorch --download_dir ../demo/models/ --output_dir ../demo/models/ cd $COREDLA_WORK/demo/models/public/mobilenet-v2-pytorch/FP32 dla_compiler --march $COREDLA_ROOT/example_architectures/AGX5_Performance.arch --network-file ./mobilenet-v2-pytorch.xml --foutput-format=open_vino_hetero --o $COREDLA_WORK/demo/mobilenet-v2-pytorch_dla.bin --batch-size=1 --fanalyze-performance --fassumed-fmax-core 200 Executing performance estimate ---------------------------------------------------------------- main_graph_0 reported throughput: 178.617 fps TOTAL DDR SPACE REQUIRED = 16.9756 MB DDR INPUT & OUTPUT BUFFER SIZE = 0.781738 MB DDR CONFIG BUFFER SIZE = 0.0986328 MB DDR FILTER BUFFER SIZE = 15.3296 MB DDR INTERMEDIATE BUFFER SIZE = 0.765625 MB NOTE: THIS ESTIMATE ASSUMES 1x I/O BUFFER. THE COREDLA RUNTIME DEFAULTS TO 5 TOTAL DDR TRANSFERS REQUIRED = 18.7003 MB DDR FILTER READS REQUIRED = 16.2124 MB DDR FEATURE READS REQUIRED = 1.62164 MB DDR FEATURE WRITES REQUIRED = 0.767578 MB NUMBER OF DDR FEATURE READS = 9 MINIMUM AVERAGE DDR BANDWIDTH REQUIRED = 3340.19 MB/s ASSUMED DDR BANDWIDTH PER IP INSTANCE = 6400 MB/s ---------------------------------------------------------------- Performance Estimator Throughput Breakdown Arch: kvec64xcvec32_i12x1_fp12agx_sb32768_xbark32_actk32_poolk4 Number of DLA instances = 1 Number of DDR Banks per DLA instance = 1 CoreDLA Target Fmax = 200 MHz PE Target Fmax = 200 MHz Batch Size = 1 PE-only Conv Throughput No DDR = 186 fps PE-only Conv Throughput = 185 fps Overall Throughput Inf PE Buf Depth (zero MPBW) = 185 fps Overall Throughput Zero PE Buf Depth (zero MPBW) = 183 fps Overall Throughput Inf PE Buf Depth = 184 fps Overall Throughput Zero PE Buf Depth = 182 fps ---------------------------------------------------------------- FINAL THROUGHPUT = 178.617 fps FINAL THROUGHPUT PER FMAX (CoreDLA) = 0.893086 fps/MHz FINAL THROUGHPUT PER FMAX (PE) = 0.893086 fps/MHz Running the model on dev kit: ./dla_benchmark -b=1 -cm $compiled_model -d=HETERO:FPGA,CPU -i $imgdir -niter=8 -plugins ./plugins.xml -arch_file $archfile -api=async -groundtruth_loc $imgdir/ground_truth.txt -perf_est -nireq=1 -bgr -nthreads=1 [Step 11/12] Dumping statistics report count: 8 iterations system duration: 191.3784 ms IP duration: 52.7551 ms latency: 23.4076 ms system throughput: 41.8020 FPS number of hardware instances: 1 number of network instances: 1 IP throughput per instance: 151.6441 FPS IP throughput per fmax per instance: 0.7582 FPS/MHz IP clock frequency measurement: 200.0000 MHz estimated IP throughput per instance: 178.6172 FPS (200 MHz assumed) estimated IP throughput per fmax per instance: 0.8931 FPS/MHz ./dla_benchmark -b=1 -cm $compiled_model -d=HETERO:FPGA,CPU -i $imgdir -niter=8 -plugins ./plugins.xml -arch_file $archfile -api=async -groundtruth_loc $imgdir/ground_truth.txt -perf_est -nireq=4 -bgr -nthreads=4 [Step 11/12] Dumping statistics report count: 8 iterations system duration: 147.8426 ms IP duration: 52.7619 ms latency: 69.8254 ms system throughput: 54.1116 FPS number of hardware instances: 1 number of network instances: 1 IP throughput per instance: 151.6246 FPS IP throughput per fmax per instance: 0.7581 FPS/MHz IP clock frequency measurement: 200.0000 MHz estimated IP throughput per instance: 178.6172 FPS (200 MHz assumed) estimated IP throughput per fmax per instance: 0.8931 FPS/MHz155Views0likes5Comments