I'm glad to see your response. During my further attempts, I encountered a new issue:
1. If I want to deploy LLaMA using the SoC approach (ARM + FPGA), should I use the S2M method? Here's the link for reference: https://www.intel.com/content/www/us/en/docs/programmable/848957/2025-1/soc-design-example-system-architecture.html?
2. The solution you provided uses the PCIE transmission method. Is it only applicable to the deployment method of Host + FPGA?
Does the FPGA AI Suite IP support the offloading of LLM operators under the Transformer architecture? Because when I use dla_compile to compile my llama (IR), I am unable to split the subgraphs and can only deploy it on the CPU.
When using the dla_compile command, if you want to deploy LLama with the HBM architecture, should the corresponding arch be selected as AGX7_Performance_Transform.arch?
5. And another question: Is the process of deploying Llama to AGX7 M in a heterogeneous manner using the SOC approach as follows or not:
a. Convert the model (ONNX/Pytorch/Transformer) to IR format (.xml, .bin) using OpenVINO.
b. Select AGX7_Performance_Transform.arch and the corresponding model's yml file in the DLA Compile of FPGA AI Suite.
c. Use DLA Create IP to convert the generated .bin file suitable for FPGA into IP.
d. Add it to Quartus Platform Designer and export the bitstream.
e. Add the bitstream to the Yocto build.
f. Download the exported Yocto file to the SD card and run it.
During this process, I would like to ask, OpenVINO does not generate the .yml file. How can I obtain this file?
There are no files with the name of HBM in the example_architecture folder. Is it the same as DDR? Because both communicate with HPS and other IPs through the AXI protocol.
Is the Quartus project file with the IP built by dla_create_ip the same as the example project file provided by S2M?