cl-fast-relaxed-math and profiling tools

Hi,

There are two questions:

First :

In OpenCL standard it provides the cl-fast-relaxed-math to speed up and could lack of accuracy.

I test the OpenCL code with this flag on INTEL,NIVIDA and AMD platforms.

It could gain a speedup ~1x.

But I use the AOCL compiler to add cl-fast-relaxed-math while compiling the OpenCL kernel Code.

It seems that it could not gain any performance. Is the AOCL library doesn't support this flag now ?

Second :

I write a OpenCL program and the program might execute EnqueueNDRange API many time(use the for loop to enqueue repeatedly). The host only executes API and READ/WRITE buffer. Although from host executes EnqueueNDRange and READ/WRITE buffer to the FPGA receive the API signal to execute kernel code will waste 10~100ms overhead. Because there is no profiling tool to profile the detail situation. Therefore could any one help this problem ?

SDK : 14.1

platform : DE5

Thanks

Forum Discussion

cl-fast-relaxed-math and profiling tools

Recent Discussions

Free Licence for Max+PlusII

MAX10 ADC - getting it to simulate in Modelsim

Failed to run ip-setup-simulation:

Compile option not saved (reversed to default)

How to fix Error(23782): Failed to find an expected report