Data level parallelism on FPGA with kernel replication using oneAPI
- 3 years ago
Hi asenjo,
Sorry for late reply, I managed to consult one of my respective team member into your question. Based on your written code, the buffers go out of scope at the end of the VectorAdd() function and the kernels get serialized instead in running in parallel.
The main() would look something like this:
1. buffer a_buf1{a_vector.begin()+begin1, a_vector.begin()+end1};
2. buffer b_buf1{b_vector.begin()+begin1, b_vector.begin()+end1};
3. buffer sum_buf1{sum_parallel.begin()+begin1, sum_parallel.begin()+end1};
4.
5. buffer a_buf2{a_vector.begin()+begin1, a_vector.begin()+end2};
6. buffer b_buf2{b_vector.begin()+begin1, b_vector.begin()+end2};
7. buffer sum_buf2{sum_parallel.begin()+begin1, sum_parallel.begin()+end2};
8.
9. auto e0 = VectorAdd<true,0,2,4>(q, a_buf1, b_buf1, sum_buf1);
10. auto e1 = VectorAdd<true,1,2,4>(q, a_buf2, b_buf2, sum_buf2);
11. q.wait();
Another option is to use sub buffers:
Another option is to use USM but but then the user is responsible to copy data back and forth themselves:
https://www.intel.com/content/www/us/en/developer/articles/code-sample/vector-add.html
Thanks.
Regards,
Aik Eu