Forum Discussion

Björne2's avatar
Björne2
Icon for New Contributor rankNew Contributor
2 years ago

Advice on CNN inference on Agilex 7 using oneAPI

Greetings everyone,

I'm tasked with porting an CNN trained with PyTorch to an Agilex 7 FPGA using HLS. I think the right tool for the job is oneAPI. Since this is not a completely novel task, I wonder if there are any existing implementations, libraries, or similar material I can reuse? Like, I prefer to not to have to implement everything from loading weights, to max pooling layers, to fixed points numerics from scratch. I'd be happy with any pointers to materials or tutorials you might have.

Thanks in advance.

15 Replies

  • Do you have a board in mind? Keep in mind that if you wish to use oneAPI FPGA Acceleration, you should choose an acceleration card with a supported BSP. We have a list of vendor cards on our homepage:

    https://cdrdv2.intel.com/v1/dl/getContent/824530

    Instead of building a full oneAPI BSP, you can also use the oneAPI DPC++/C++ compiler to create IP that you can integrate using a platform designer system. We demonstrate this in the Platform Designer code sample, and the Nios V reference design. You can learn more about IP interface customization by studying the HLS Flow Interfaces code samples as well.

    Manually integrating your IP with Platform Designer (or SystemVerilog/VHDL if you are so inclined) gives you the ability to accelerate the embedded HPS, so you are not tied to an x86-64 host CPU.

    • Björne2's avatar
      Björne2
      Icon for New Contributor rankNew Contributor

      > Do you have a board in mind? Keep in mind that if you wish to use
      > oneAPI FPGA Acceleration, you should choose an acceleration card
      > with a supported BSP. We have a list of vendor cards on our
      > homepage:

      Yeah, the board is a DE10 Agilex 7 from Terasic. The exact model is
      AGF 7 014 B2E2_8GBx4.

      > Instead of building a full oneAPI BSP, you can also use the oneAPI
      > DPC++/C++ compiler to create IP that you can integrate using a
      > platform designer system.

      Well, I have a server license for Quartus Prime 21.2. Previously I
      have used the aoc (Intel(R) FPGA SDK for OpenCL(TM) Kernel Compiler)
      command to build FPGA bitstreams from OpenCL code so I think I already
      have a suitable BSP installed. What I'm missing is how to "connect"
      icpx (Intel(R) oneAPI DPC++/C++ Compiler) to the FPGA. It was easy
      with OpenCL. I just compiled the kernel with aoc and then loaded it
      onto the FPGA with OpenCL host code. It appears it is not that easy
      with SYCL.

      • whitepau_altera's avatar
        whitepau_altera
        Icon for Contributor rankContributor

        There is an additional step; with OpenCL (and indeed, earlier versions of oneAPI) we shipped some popular BSPs along with the tools, but since 2022 we stopped that to limit the installation size. You should be able to get a BSP from Terasic (indeed it should have been provided when you purchased it). Once you install the BSP, you can point the compiler to it when you compile your code. We explain how to do this in the code samples.

        Managing an FPGA Board

        FPGA Compile Code Sample

        FYI: that BSP depends on an older version of Quartus, and unless Terasic updates the BSP, it will fall out of the support window.

  • haoyanwa's avatar
    haoyanwa
    Icon for New Contributor rankNew Contributor

    Thank you for reaching out! I recommend checking out HLS4ML: HLS4ML GitHub Repository.

    It is a framework designed to convert machine learning models from popular libraries like PyTorch and Keras into FPGA binaries. It integrates seamlessly with oneAPI by utilizing the DPC++/C++ compiler in the backend to generate IP blocks that represent the different components of your model, such as layers, activation functions, and more.

    While HLS4ML is still a work in progress, it offers a good start point for which you can save massive time from implementing everything from scratch, including handling weights, pooling layers, and fixed-point arithmetic.

    For a step-by-step guide on how to get started, you can explore their tutorials here: HLS4ML Tutorials. These Jupyter Notebooks walk you through the process from building and training a model to emulating it on an FPGA.

    Let us know if you have any further questions.

  • Hi @Björne2,


    Good to know that it is working now, as we see no further clarification on this thread, it will be transitioned to community support for further help on doubts in this thread. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support.

    Thank you for the questions and as always pleasure having you here.


    Best Wishes

    BB