Forum Discussion
Altera_Forum
Honored Contributor
8 years agounrolling the loop on c improved the kernel. After reading the Best Practice Guide i tried to improve my GaussianBlur function like suggested in 1.6.1.5.
With help from the example in the guide and the exampleimplementation of an Sobel-Filter on: https://www.altera.com/support/support-resources/design-examples/design-software/opencl/sobel-filter.html But my output of the kernel isn´t correct. Kernel-Code:#define maskWidth 7
# define COLS 640
__kernel void gaussneu(global uchar * restrict frame_in, global uchar * restrict frame_out,
const int iterations/*, const int COLS*/)
{
// Filter coefficients
float mask =
{
{ 0.0049, 0.0092, 0.0134, 0.0152, 0.0134, 0.0092, 0.0049},
{ 0.0092, 0.0172, 0.0250, 0.0283, 0.0250, 0.0172, 0.0092},
{ 0.0134, 0.0250, 0.0364, 0.0412, 0.0364, 0.0250, 0.0134},
{ 0.0152, 0.0283, 0.0412, 0.0467, 0.0412, 0.0283, 0.0152},
{ 0.0134, 0.0250, 0.0364, 0.0412, 0.0364, 0.0250, 0.0134},
{ 0.0092, 0.0172, 0.0250, 0.0283, 0.0250, 0.0172, 0.0092},
{ 0.0049, 0.0092, 0.0134, 0.0152, 0.0134, 0.0092, 0.0049},
};
// Pixel buffer of 6 rows and 7 extra pixels
int rows;
// The initial iterations are used to initialize the pixel buffer.
int count = -(6 * COLS + 7);
while (count != iterations)
{
// Each cycle, shift a new pixel into the buffer.
// Unrolling this loop allows the compile to infer a shift register.
# pragma unroll
for (int i = COLS * 6 + 6; i > 0; --i)
{
rows = rows;
}
rows = count >= 0 ? frame_in : 0; // if count >=0 -> true rows = frame_in
float sum = 0;
// With these loops unrolled, one convolution can be computed every cycle.
# pragma unroll
for (int i = 0; i < maskWidth; ++i)
{
# pragma unroll
for (int j = 0; j < maskWidth; ++j)
{
uchar pixel = rows;
sum += mask*pixel;
}
}
if (count >= 0) {
frame_out = sum;
}
count++;
}
} Can someone tell me whats the problem? Also I think there is an error on the sobel.cl file from the altera example in this part:
int temp = abs(x_dir) + abs(y_dir);
unsigned int clamped;
if (temp > threshold) {
clamped = 0xffffff;
} else {
clamped = 0;
}
if (count >= 0) {
frame_out = clamped;
} the computet value temp is never set, the output values only can take the values 0 or 0xffffffff