Forum Discussion
As for now the tools for profiling and kernel analysis is very limited. I agree that in terms of designing for performance, trial and error and testing to find the best is unrealistic with very large designs taking up to around 6hours +. However, wanting to see which implementations gives the best performance, as for now, the current way is to understand the mapping from OpenCL code to the FPGA fabric. The best optimization guide does a pretty decent job at explaining how to structure the kernel to give the tools as much information as possible in order for them to optimize it the best they can. The profiler only shows performance in terms of memory access. The features you've suggested definitely would come in handy, especially during the fine tuning stage to try and get as much performance out of the FPGA as possible.