Altera_Forum
Honored Contributor
8 years agoImprovement of self-written OpenCL-Funktion (GaussianBlur)
Hello, I have implemented a Gaussian Filter on the FPGA (Cyclone V SoC) using OpenCL which works ok (2.5 times faster as on the ARM) but i´m not quite sure if it´s optimal for the FPGA.
host-code: ... status = clsetkernelarg(kernel,0,sizeof(cl_mem),&buffer_img); // Matrix which holds the Kernelparameters status = clsetkernelarg(kernel,1,sizeof(cl_mem),&buffer_mask); // Matrix which holds an graysclae image, status = clsetkernelarg(kernel,2,sizeof(cl_mem),&buffer_outputimg); // Matrix for output status = clsetkernelarg(kernel,3,sizeof(int),&img.cols); status = clsetkernelarg(kernel,4,sizeof(int),&maskwidth);
size_t globalWorkSize[2]; globalWorkSize[0] = output.cols; globalWorkSize[1] = output.rows; status = clEnqueueNDRangeKernel(cmdQueue,kernel,2,NULL, globalWorkSize, NULL,0, NULL,NULL); ... kernel-code: __kernel void convolve(__global uchar * input, __global float * mask, __global uchar * output, const int inputWidth,const int maskWidth) { const int x = get_global_id(0); const int y = get_global_id(1); float sum = 0; for (int r = 0; r < maskWidth; r++) { //Inkrementieren rowindex with picturewidth const int idxrow = (y + r) * inputWidth + x; for (int c = 0; c < maskWidth; c++) { //convolve sum += mask[(r * maskWidth) + c] * input[idxrow + c]; } } output[y * get_global_size(0) + x] = sum; } Can someone tell me if and how it´s possible to improve the peroformance of the Gaussian Kernel on the FPGA? Thanks :)