Forum Discussion
Altera_Forum
Honored Contributor
8 years ago --- Quote Start --- Are you launching this as a single work-item kernel or as an NDRange kernel? Since you're using get_global_id (or trying to), I presume NDRange. Maybe the design would work better as a single work item kernel. Some code (from the host and kernel) might help to explain. --- Quote End --- NDRange, as soon as I add the get_global_id it will become much slower... It did get the correct global_id when I actually use it.
__attribute__((num_compute_units(2)))
__attribute__((reqd_work_group_size(2, 1, 1)))
__kernel void pointWiseMul(__global float2* restrict d_afCorr, __global float2* restrict d_afPadScn, __global float2* restrict d_afPadTpl, int dataN, float fScale)
{
int begin = get_global_id(0);//mark out this line and the speed change dramatically
for (int iIndx = 0; iIndx < dataN; iIndx++)
{
float2 cDat = d_afPadScn;
float2 cKer = d_afPadTpl;
//take the conjugate of the kernel
cKer.y = -cKer.y;
float2 cMul = { cDat.x* cKer.x - cDat.y * cKer.y, cDat.y * cKer.x + cDat.x * cKer.y };
cMul.x = fScale * cMul.x;
cMul.y = fScale * cMul.y;
d_afCorr = cMul;
}
}