What is the best work-group-size, SIMD and the number of compute units for kernel?
Hi, i recently came to OpenCL framework and Want to implement Sobel filter as a test case.I know that unlike GPU, FPGA has flexible architecture and the designer is responsible to create that. Good thing about GPU implementation was that GPU OpenCL compiler can automatically set the best group size for the designer or there is "kernel analyzer" to suggest the best group size before compilation.Since every FPGA compilation takes around 1 hour and emulation timing is incorrect, Is there any way (like "kernel analyzer" in GPU implementation)to understand what is the best optimization for my kernel in FPGA except "try & error"?What about the best combination of compute units and SIMD for Single Work-Item Kernel?First attached file is NDrange kernel for Sobel filter, Second attached file is Single Work-Item Kernel for Sobel filter,thanks.