Forum Discussion

lexie11's avatar
lexie11
Icon for New Contributor rankNew Contributor
3 months ago

Agilex7 m-series for llama

I am undertaking a project to deploy llama using the agilex7 m-series, and during the process, I utilized the FPGA AI Suite. However, dla_compiler does not support the sinking of graphs to FPGA. Could it be that the gather operator is not supported, or is it because the tensors have dynamic dimensions? This prevented me from generating the .bin file suitable for the FPGA. In addition, the FPGA AI Suite does not provide the ARCH file for HBM, and the list of selectable devices for the plugin does not include the M-series. Could you provide some BSP support instead?

5 Replies

    • lexie11's avatar
      lexie11
      Icon for New Contributor rankNew Contributor

      Hi,

      Based on the above information, execute the following instruction:
      1.You are a professional translator responsible for converting Chinese content into English. Please help me translate the original content
      Agilex7 m-series does not provide a BSP. If I want to deploy llama on Agilex7, can I only generate the BSP through OFS customization?
      2.In the RTL support provided by OFS, I did not find HBM. Is it necessary to generate it myself through Quartus Prime Pro?
      3.In the 3.example_architecture folder, no information about hbm is provided. I would like to know how to modify the arch file in order to use hbm?

    • lexie11's avatar
      lexie11
      Icon for New Contributor rankNew Contributor

      Hi,

      Based on the above information, execute the following instruction:
      1.Agilex7 m-series does not provide a BSP. If I want to deploy llama on Agilex7, can I only generate the BSP through OFS customization?
      2.In the RTL support provided by OFS, I did not find HBM. Is it necessary to generate it myself through Quartus Prime Pro?
      3.In the 3.example_architecture folder, no information about hbm is provided. I would like to know how to modify the arch file in order to use hbm?

  • JohnT_Altera's avatar
    JohnT_Altera
    Icon for Regular Contributor rankRegular Contributor

    Hi,


    1.Agilex7 m-series does not provide a BSP. If I want to deploy llama on Agilex7, can I only generate the BSP through OFS customization?

    Yes, you are correct. You will need to customize it.

    2.In the RTL support provided by OFS, I did not find HBM. Is it necessary to generate it myself through Quartus Prime Pro?

    You can generate OFS for M-series dev kit from " ./ofs-common/scripts/common/syn/build_top.sh --ofss tools/ofss_config/mseries-dk.ofss mseries-dk:flat work_mseries-dk". Please refer to https://github.com/OFS/ofs-agx7-pcie-attach for all the supported BSP

    3.In the 3.example_architecture folder, no information about hbm is provided. I would like to know how to modify the arch file in order to use hbm?

    The arch file does not need to make the changes as the you it is handle from the BSP. The performance will change when it is running through HBM compare to DDR.


    Thanks.


    • lexie11's avatar
      lexie11
      Icon for New Contributor rankNew Contributor

      I'm glad to see your response. During my further attempts, I encountered a new issue:
      1. If I want to deploy LLaMA using the SoC approach (ARM + FPGA), should I use the S2M method? Here's the link for reference: https://www.intel.com/content/www/us/en/docs/programmable/848957/2025-1/soc-design-example-system-architecture.html? 

      2. The solution you provided uses the PCIE transmission method. Is it only applicable to the deployment method of Host + FPGA? 

      Does the FPGA AI Suite IP support the offloading of LLM operators under the Transformer architecture? Because when I use dla_compile to compile my llama (IR), I am unable to split the subgraphs and can only deploy it on the CPU. 

      When using the dla_compile command, if you want to deploy LLama with the HBM architecture, should the corresponding arch be selected as AGX7_Performance_Transform.arch? 

      5. And another question: Is the process of deploying Llama to AGX7 M in a heterogeneous manner using the SOC approach as follows or not:
      a. Convert the model (ONNX/Pytorch/Transformer) to IR format (.xml, .bin) using OpenVINO.
      b. Select AGX7_Performance_Transform.arch and the corresponding model's yml file in the DLA Compile of FPGA AI Suite.
      c. Use DLA Create IP to convert the generated .bin file suitable for FPGA into IP.
      d. Add it to Quartus Platform Designer and export the bitstream.
      e. Add the bitstream to the Yocto build.
      f. Download the exported Yocto file to the SD card and run it.
      During this process, I would like to ask, OpenVINO does not generate the .yml file. How can I obtain this file? 
      There are no files with the name of HBM in the example_architecture folder. Is it the same as DDR? Because both communicate with HPS and other IPs through the AXI protocol. 
      Is the Quartus project file with the IP built by dla_create_ip the same as the example project file provided by S2M?