invalid work group size error, dpc++ code running on Intel Arria 10 oneAPI on devcloud

Question

Hello,I am using devcloud to run my dpc++ code on FPGA hardware for accelration. I am using a node that runs Arria 10 OneAPI. I was able to run the fpga_emu file and the results were as expected. When I use FPGA hardware it gives this error:
Caught a SYCL host exception:Non-uniform work-groups are not supported by the target device -54 (CL_INVALID_WORK_GROUP_SIZE)terminate called after throwing an instance of 'cl::sycl::nd_range_error'what(): Non-uniform work-groups are not supported by the target device -54 (CL_INVALID_WORK_GROUP_SIZE)Aborted

I don't see any problem with the sizes of the work groups. 
  range&lt;1&gt; num_items{dataset.size()};

res.resize(dataset.size());
    buffer dataset_buf(linear_dataset);
    buffer curr_test_buf(curr_test);
    buffer res_buf(res.data(), num_items);
    
    std::cout&lt;&lt;"submit a job"&lt;&lt;std::endl;
    //auto start = std::chrono::high_resolution_clock::now();
    {
    q.submit([&amp;](handler&amp; h) {
        accessor a(dataset_buf, h, read_only);
        accessor b(curr_test_buf, h, read_only);

accessor dif(res_buf, h, read_write, no_init);
         h.parallel_for_work_group(range&lt;1&gt;(32), range&lt;1&gt;(500), [=](group&lt;1&gt; g) {
            g.parallel_for_work_item([&amp;](h_item&lt;1&gt; item) {
                 int i = item.get_global_id(0);
                for (int j = 0; j &lt; 5; ++j) {
                    dif[i] += (b[j] - a[i * 5 + j]) * (b[j] - a[i * 5 + j]);  
                }
           // out &lt;&lt; "i : " &lt;&lt; i &lt;&lt; " i[0]: " &lt;&lt; i[0] &lt;&lt; " b: " &lt;&lt; b[0] &lt;&lt; cl::sycl::endl;
               });
            });
        }).wait();
    }

I previously used normal parallel_for like this, and it gave me huge time on FPGA hardware to run, which accelerated nothing actually, that's why I though of work groups: 
 range&lt;1&gt; num_items{dataset.size()};
    std::vector&lt;double&gt;res;

res.resize(dataset.size());
    buffer dataset_buf(linear_dataset);
    buffer curr_test_buf(curr_test);
    buffer res_buf(res.data(), num_items);
    
    std::cout&lt;&lt;"submit a job"&lt;&lt;std::endl;
    //auto start = std::chrono::high_resolution_clock::now();
    {
    q.submit([&amp;](handler&amp; h) {
        accessor a(dataset_buf, h, read_only);
        accessor b(curr_test_buf, h, read_only);

accessor dif(res_buf, h, read_write, no_init);
        h.parallel_for(num_items, [=](auto i) {
            //  dif[i] = a[i].size() * 1.0;// a[i];
                for (int j = 0; j &lt; 5; ++j) {
                    dif[i] += (b[j] - a[i * 5 + j]) * (b[j] - a[i * 5 + j]);  
                }
           // out &lt;&lt; "i : " &lt;&lt; i &lt;&lt; " i[0]: " &lt;&lt; i[0] &lt;&lt; " b: " &lt;&lt; b[0] &lt;&lt; cl::sycl::endl;
            });
        }).wait();
    }
 Thanks a lot!

aikeu · Answer

Hi amaltaha,Can share with me through email regarding the project that you are trying to run?I can try to run on my side and see.Thanks.Regards,Aik Eu

amaltaha · Answer

Hello Aik Eu!
I wanted speed efficiency, I tried to split the 16,000 samples (each contains 5 features, double precision) into smaller chunks. But it didn't work.

Thank you!

aikeu · Answer

Hi amaltaha,Do you mean the error still there or due to your handling in design?Thanks.Regards,Aik Eu

aikeu · Answer

Hi amaltaha,I will close this thread if no further question.Thanks.Regards,Aik Eu

Forum Discussion

invalid work group size error, dpc++ code running on Intel Arria 10 oneAPI on devcloud

4 Replies

Recent Discussions

Agilex 7 I-Series "aocl diagnose acl0" error following OFS

AI Suite System Throughput Issue

HLS Compiler 24.1 error - aocl-clang.exe - dll entry point not found

How Do I get the License for HLS?

Deprecation Notice for FPGA Support Package for oneAPI DPC++/C++. What is the alternative?