Forum Discussion

student4's avatar
student4
Icon for New Contributor rankNew Contributor
4 years ago

DPC++ project built for FPGA emulation takes very long time to implement on CPU

Hi,

I have a project developed using standard C++ and another project with same function developed using DPC++ libraries and built in FPGA emulation mode. The standard C++ version runs fine when used with my test project. But the DPC++ version takes very long time to produce results when used in my test program. Should I disable/make changes in the project settings for a faster implementation.

I appreciate your help.

Thank you.

6 Replies

  • Hi @student4,

    Thank you for posting in Intel community forum on your interest in oneAPI and hope all is well.
    Mind if I asked what libraries is used in the DPC++ code? Would it be convenient for you to share the mention test project which contain both C++ and DPC++ code?


    And are you using the Intel Devcloud to run the emulation?
    Hope to hear from you soon.

    Best Wishes
    BB

    • student4's avatar
      student4
      Icon for New Contributor rankNew Contributor

      Hi BB,

      Thank you for your reply.

      I am using Microsoft Visual Studio 2019 to run the emulation.

      I am not able to attach the file. Please refer to the test project implementation below

      #include <iostream>
      #include <numeric>
      #include <chrono>
      #include <iomanip>
      #include <complex>
      #include <array>
      #include <vector>
      #include<fstream>
      //#define standard_cpp
      #define oneApi_FPGA

      #ifdef standard_cpp
      #include "../Standard_CPP/standardCPPFile.h"

      #endif

      #ifdef oneApi_FPGA
      #include "../Oneapi_FPGA/DPCPPFile.h"
      #endif
      using namespace std;

      void Test()
      {

      #ifdef standard_cpp
      DPCPPFile obj;
      #endif
      #ifdef oneApi_FPGA
      standardCPPFile obj;
      #endif
      for (int i = 0; i < 999000; i++)
      {

      obj.func();
      }
      }

      int main()
      {
      Test();
      cout << "success" << endl;

      return 0;
      }

      The test project calls both standard C++ and DPC++ versions of the function->func

      This is a sample implementation of func in DPC++.

      void DPCPPFile::func(int v)
      {
      std::vector<std::complex<float>>Atemp(7);
      std::vector<std::complex<float>>Btemp(7);
      std::vector<std::vector<std::complex<float>>>Ztemp(7);
      // std::cout << "find best symbol index is" << v << std::endl;

      cl::sycl::ext::intel::fpga_emulator_selector d_selector;
      // queue declaration
      cl::sycl::queue Q(d_selector);
      sycl::buffer AtempBuff(Atemp);
      sycl::buffer BtempBuff(Btemp);
      sycl::buffer ZtempBuff(Ztemp);

      Q.submit([&](sycl::handler& h)
      {
      sycl::accessor AtempAccess(AtempBuff, h, sycl::write_only);
      sycl::accessor BtempAccess(BtempBuff, h, sycl::read_only);
      sycl::accessor ZtempAccess(ZtempBuff, h, sycl::read_only);

      h.parallel_for(sycl::range<1>(Nr), [=](auto idx)
      {
      AtempAccess[idx] = std::conj(BtempAccess[v - 1]) * ZtempAccess[v][idx];
      });
      });

      sycl::host_accessor AtempHost(AtempBuff);

      Below is the same implementation in standard CPP

      void standardCPPFile::func(int v,int Nr)
      {
      std::vector<std::complex<float>>Atemp(7);
      std::vector<std::complex<float>>Btemp(7);
      std::vector<std::complex<float>>Ztemp(7);
      // std::cout << "find best symbol index is" << v << std::endl;.
      for (int i = 0; i < Nr; i++)
      {
      Atemp[Nr] += std::conj(Btemp[v - 1]) * Ztemp[v][Nr];
      }


      }

      Thank you.

  • HRZ's avatar
    HRZ
    Icon for Frequent Contributor rankFrequent Contributor

    Since you are building in emulation mode, it is expected that the code would take a very long time to execute since it is emulating the FPGA hardware in software. If you compile for and run your code on an actual FPGA, then it will be much faster. Emulation mode is just to ensure code correctness; the time it takes for the application to execute in emulation mode does NOT represent the time it would take for the code to run on an actual FPGA.

  • Hi @student4,


    Good day, just checking in to see if there is any further doubts in regards to this matter.

    Hope we have clarify your doubts.


    Best Wishes

    BB


  • Hi @student4,


    Greetings, as we do not receive any further clarification on what is provided, we would assume challenge are overcome. Hence thread will no longer be monitored. For new queries, please feel free to open a new thread and we will be right with you. Pleasure having you here.


    Best Wishes

    BB