Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
10 years ago

What is the best work-group-size, SIMD and the number of compute units for kernel?

Hi, i recently came to OpenCL framework and Want to implement Sobel filter as a test case.I know that unlike GPU, FPGA has flexible architecture and the designer is responsible to create that. Good thing about GPU implementation was that GPU OpenCL compiler can automatically set the best group size for the designer or there is "kernel analyzer" to suggest the best group size before compilation.Since every FPGA compilation takes around 1 hour and emulation timing is incorrect, Is there any way (like "kernel analyzer" in GPU implementation)to understand what is the best optimization for my kernel in FPGA except "try & error"?What about the best combination of compute units and SIMD for Single Work-Item Kernel?First attached file is NDrange kernel for Sobel filter, Second attached file is Single Work-Item Kernel for Sobel filter,thanks.

1 Reply

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    As for now the tools for profiling and kernel analysis is very limited. I agree that in terms of designing for performance, trial and error and testing to find the best is unrealistic with very large designs taking up to around 6hours +. However, wanting to see which implementations gives the best performance, as for now, the current way is to understand the mapping from OpenCL code to the FPGA fabric. The best optimization guide does a pretty decent job at explaining how to structure the kernel to give the tools as much information as possible in order for them to optimize it the best they can. The profiler only shows performance in terms of memory access. The features you've suggested definitely would come in handy, especially during the fine tuning stage to try and get as much performance out of the FPGA as possible.