SLAM on FPGA using Altera OpenCL

Honored Contributor

9 years ago

Thanks again for your help.

I have now executed one of my kernels on an actual FPGA. However, I do not get any speed up or slow down when I change my kernel from NDrange to single-work-item (which I was not expecting at least for this simple kernel). The kernel I am executing on the FPGA contains the following:

uint2 pixel = (uint2) (get_global_id(0),get_global_id(1));

depth= ...

and to change it to single-work-item I changed it this way (Also replacing clEnqueueNDRangeKernel to clEnqueueTask in the host).

for(uint pixel_y=0;pixel_y<240;pixel_y++){

for(uint pixel_x=0;pixel_x<320;pixel_x++){

depth=....

}

Is this the correct way of changing to single-work-item kernel (I did not find a proper method anywhere)? If not how should do it?, what is your suggestion on to improve the execution time of this kernel? How about for a more complex kernel in my application like this. Shall I change to single work item (like above) and follow the optimization report or follow the guide on "how to improve NDRange kernels"?

const uint2 pos = (uint2) (get_global_id(0),get_global_id(1));

const uint2 size = (uint2) (get_global_size(0),get_global_size(1));

const float center = in[pos.x + size.x * pos.y];

if ( center == 0 ) {

out[pos.x + size.x * pos.y] = 0;

return;

}

for(int i = -r; i <= r; ++i) {

for(int j = -r; j <= r; ++j) {

const uint2 curPos = (uint2)(clamp(pos.x + i, 0u, size.x-1), clamp(pos.y + j, 0u, size.y-1));

const float curPix = in[curPos.x + curPos.y * size.x];

if(curPix > 0) {

sum += factor;

}

out[pos.x + size.x * pos.y] = t / sum;

The reason I am asking is that for the moment, I have to stick with the current old version of AOCL.I want to get a feeling of what to think about and follow the correct optimization path from the start while not having to wait a few hours for each method and approach to compile for me to see the timing results.

Thank you very much.

Forum Discussion

Recent Discussions

Generate Simulation Setup Script Fails

FIR IP configured for Interpolation

Altera SSLC License

Lisence issue when running .do script

How to create a Packaged Subsystem in TCL