Agilex7 m-series for llama

Question

I am undertaking a project to deploy llama using the agilex7 m-series, and during the process, I utilized the FPGA AI Suite. However, dla_compiler does not support the sinking of graphs to FPGA. Could it be that the gather operator is not supported, or is it because the tensors have dynamic dimensions? This prevented me from generating the .bin file suitable for the FPGA. In addition, the FPGA AI Suite does not provide the ARCH file for HBM, and the list of selectable devices for the plugin does not include the M-series. Could you provide some BSP support instead?

johnt_altera · Answer

Hi,To use HBM, please refer to https://www.intel.com/content/www/us/en/docs/programmable/854150/current/introduction.html on how to enable it.For architecture support, you will need to modify the ARCH file so that is is able to run the model in FPGA.

lexie11 · Answer

Hi，Based on the above information, execute the following instruction:1.You are a professional translator responsible for converting Chinese content into English. Please help me translate the original contentAgilex7 m-series does not provide a BSP. If I want to deploy llama on Agilex7, can I only generate the BSP through OFS customization?2.In the RTL support provided by OFS, I did not find HBM. Is it necessary to generate it myself through Quartus Prime Pro?3.In the 3.example_architecture folder, no information about hbm is provided. I would like to know how to modify the arch file in order to use hbm?

lexie11 · Answer

Hi，Based on the above information, execute the following instruction:1.Agilex7 m-series does not provide a BSP. If I want to deploy llama on Agilex7, can I only generate the BSP through OFS customization?2.In the RTL support provided by OFS, I did not find HBM. Is it necessary to generate it myself through Quartus Prime Pro?3.In the 3.example_architecture folder, no information about hbm is provided. I would like to know how to modify the arch file in order to use hbm?

johnt_altera · Answer

Hi,1.Agilex7 m-series does not provide a BSP. If I want to deploy llama on Agilex7, can I only generate the BSP through OFS customization?Yes, you are correct. You will need to customize it.2.In the RTL support provided by OFS, I did not find HBM. Is it necessary to generate it myself through Quartus Prime Pro?You can generate OFS for M-series dev kit from " ./ofs-common/scripts/common/syn/build_top.sh --ofss tools/ofss_config/mseries-dk.ofss mseries-dk:flat work_mseries-dk". Please refer to https://github.com/OFS/ofs-agx7-pcie-attach for all the supported BSP3.In the 3.example_architecture folder, no information about hbm is provided. I would like to know how to modify the arch file in order to use hbm?The arch file does not need to make the changes as the you it is handle from the BSP. The performance will change when it is running through HBM compare to DDR.Thanks.

lexie11 · Answer

I'm glad to see your response. During my further attempts, I encountered a new issue:
1. If I want to deploy LLaMA using the SoC approach (ARM + FPGA), should I use the S2M method? Here's the link for reference: https://www.intel.com/content/www/us/en/docs/programmable/848957/2025-1/soc-design-example-system-architecture.html?

2. The solution you provided uses the PCIE transmission method. Is it only applicable to the deployment method of Host + FPGA?

Does the FPGA AI Suite IP support the offloading of LLM operators under the Transformer architecture? Because when I use dla_compile to compile my llama (IR), I am unable to split the subgraphs and can only deploy it on the CPU.

When using the dla_compile command, if you want to deploy LLama with the HBM architecture, should the corresponding arch be selected as AGX7_Performance_Transform.arch?

5. And another question: Is the process of deploying Llama to AGX7 M in a heterogeneous manner using the SOC approach as follows or not:
a. Convert the model (ONNX/Pytorch/Transformer) to IR format (.xml, .bin) using OpenVINO.
b. Select AGX7_Performance_Transform.arch and the corresponding model's yml file in the DLA Compile of FPGA AI Suite.
c. Use DLA Create IP to convert the generated .bin file suitable for FPGA into IP.
d. Add it to Quartus Platform Designer and export the bitstream.
e. Add the bitstream to the Yocto build.
f. Download the exported Yocto file to the SD card and run it.
During this process, I would like to ask, OpenVINO does not generate the .yml file. How can I obtain this file?
There are no files with the name of HBM in the example_architecture folder. Is it the same as DDR? Because both communicate with HPS and other IPs through the AXI protocol.
Is the Quartus project file with the IP built by dla_create_ip the same as the example project file provided by S2M?

Forum Discussion

Agilex7 m-series for llama

5 Replies

Recent Discussions

yolov3_tiny_tf run_inference_stream problem

Looking for guidance on CXL IP access (university research, Agilex 7 I-series)

DE10-Lite and sdram controller ip

JTAG Chain Broken – Unable to Program Agilex 5 Modular Development Board

DK-DEV-AGI027RES Install Package