Altera_Forum
Honored Contributor
7 years agoefficient global memory access for dynamic indexing
Hello,
my OpenCL task (no ND-range) has a dynamic indexing access inside a loop. what is the maximum expected bandwidth to get from this code for 'value' array ? will coalescence access work for it? all arrays are very huge size global variables. actually what I get currently is 1 32-bit data (float) per 2 clock cycles which I guess is sub-optimum. for (unsigned i = 0; i < n; i++) { acc = 0.0; ei = end_index;si = start_index; for(unsigned j = si; j < ei; j++) acc += value[dyn_index[j]]; next_value[i] = acc; }