Support for GPU and FPGA programming

Question

Hi, I was wondering , does OneAPI support GPU and FPGA programming in the same code? I found the tutorial didn't give out a very specific idea.

jananic_intel · Answer

Hi,Thanks for posting in Intel forums.Intel oneapi has samples that support cpu,gpu and fpga.You can find those samples in oneapi-cli in devcloud.You can find the links to similar samples below.Try these samples to know more about it.https://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/DPC%2B%2B/DenseLinearAlgebra/simple-addhttps://github.com/oneapi-src/oneAPI-samples/tree/master/DirectProgramming/DPC%2B%2B/DenseLinearAlgebra/vector-addHope this helps!

gao__chao · Answer

Hi, 
sorry to bother you again, I don't think this is what I am looking for, I was wondering whether I can write up a DPC++ file which support GPU and FPGA work in parallel. I was wondering whether this can be achieved in one single file?
Thanks

jananic_intel · Answer

Hi,Yes,this is possible by launching 2 kernels with different device selector. One kernel with gpu_selector and other with intel::fpga_selector. As the kernel call is asynchronous they both will work parallelly.Thanks.

gao__chao · Answer

Hi,
Could you give me some examples?
I didn't find any example which implements GPU and FPGA in two kernel in the same DPC++ file, also, the compile make file is also different I guess, how do we compile that?
Thanks
Chao

abhishekd_intel · Answer

Hi Chao,

I am assuming that you want to implement your code parallelly on iGPU and fpga_emulator.
Please find the below code sample which will run a simple vector add on iGPU and fpga_emulator.

#include &lt;iostream&gt;
#include &lt;CL/sycl.hpp&gt;
#include &lt;CL/sycl/intel/fpga_extensions.hpp&gt;
#define N 10

int main(int, char**) {

float *d1_a=(float *)malloc(N*sizeof(float));
        float *d1_b=(float *)malloc(N*sizeof(float));
        float *d1_c=(float *)malloc(N*sizeof(float));

float *d2_a=(float *)malloc(N*sizeof(float));
        float *d2_b=(float *)malloc(N*sizeof(float));
        float *d2_c=(float *)malloc(N*sizeof(float));

for(long int i=0;i&lt;N;i++){
                d1_a[i]=i;
                d1_b[i]=N-i;
                d2_a[i]=i;
                d2_b[i]=N-i;
        }

auto exception_handler = [] (cl::sycl::exception_list exceptions) {
            for (std::exception_ptr const&amp; e : exceptions) {
                try {
                        std::rethrow_exception(e);
                } catch(cl::sycl::exception const&amp; e) {
                std::cout &lt;&lt; "Caught asynchronous SYCL exception:
"&lt;&lt; e.what() &lt;&lt; std::endl;
                }
            }
        };

cl::sycl::queue queue_d1(cl::sycl::gpu_selector{}, exception_handler);
        cl::sycl::queue queue_d2(cl::sycl::intel::fpga_emulator_selector{}, exception_handler);
        /*std::cout &lt;&lt; "Running on "
                &lt;&lt; queue_d2.get_device().get_info&lt;cl::sycl::info::device::name&gt;()
                &lt;&lt; "
";
                */

{
                cl::sycl::buffer&lt;float, 1&gt; d1_a_sycl{d1_a, cl::sycl::range&lt;1&gt;{N} };
                cl::sycl::buffer&lt;float, 1&gt; d1_b_sycl{d1_b, cl::sycl::range&lt;1&gt;{N} };
                cl::sycl::buffer&lt;float, 1&gt; d1_c_sycl{d1_c, cl::sycl::range&lt;1&gt;{N} };

cl::sycl::buffer&lt;float, 1&gt; d2_a_sycl{d2_a, cl::sycl::range&lt;1&gt;{N} };
                cl::sycl::buffer&lt;float, 1&gt; d2_b_sycl{d2_b, cl::sycl::range&lt;1&gt;{N} };
                cl::sycl::buffer&lt;float, 1&gt; d2_c_sycl{d2_c, cl::sycl::range&lt;1&gt;{N} };

queue_d1.submit([&amp;] (cl::sycl::handler&amp; cgh) {
                                auto a_acc = d1_a_sycl.get_access&lt;cl::sycl::access::mode::read&gt;(cgh);
                                auto b_acc = d1_b_sycl.get_access&lt;cl::sycl::access::mode::read&gt;(cgh);
                                auto c_acc = d1_c_sycl.get_access&lt;cl::sycl::access::mode::discard_write&gt;(cgh);

cgh.parallel_for&lt;class vector_addition_d1&gt;(cl::sycl::range&lt;1&gt;{ N }, [=](cl::sycl::id&lt;1&gt; idx) {
                                                c_acc[idx] = a_acc[idx] + b_acc[idx];

});
                });

queue_d2.submit([&amp;] (cl::sycl::handler&amp; cgh) {
                                auto a_acc = d2_a_sycl.get_access&lt;cl::sycl::access::mode::read&gt;(cgh);
                                auto b_acc = d2_b_sycl.get_access&lt;cl::sycl::access::mode::read&gt;(cgh);
                                auto c_acc = d2_c_sycl.get_access&lt;cl::sycl::access::mode::discard_write&gt;(cgh);

cgh.parallel_for&lt;class vector_addition_d2&gt;(cl::sycl::range&lt;1&gt;{ N }, [=](cl::sycl::id&lt;1&gt; idx) {
                                                c_acc[idx] = a_acc[idx] + b_acc[idx];

});
                });

}

try {
                queue_d1.wait_and_throw();
                queue_d2.wait_and_throw();
        }catch (cl::sycl::exception const&amp; e) {
                std::cout &lt;&lt; "Caught synchronous SYCL exception:
"&lt;&lt; e.what() &lt;&lt; std::endl;
        }

for(int i=0;i&lt;N;i++)
                std::cout&lt;&lt;d1_c[i]&lt;&lt;" ";

std::cout&lt;&lt;std::endl;

for(long int i=0;i&lt;N;i++)
                std::cout&lt;&lt;d2_c[i]&lt;&lt;" ";

std::cout&lt;&lt;std::endl;

return 0;
}

You can change the device selector according to your use-case.

Warm Regards,
Abhishek

Forum Discussion

Support for GPU and FPGA programming

8 Replies

Recent Discussions

Agilex 7 FPGA Starter Kit with oneAPI Toolkit flow not detected over PCIe

MCTP over PCIe VDM routing to PMCI in OFS N6000 FIM configuration and datapath clarification

HLS Compiler 24.1 error - aocl-clang.exe - dll entry point not found

Error faced while executing on Agilex FPGA board....

AI Suite System Throughput Issue