Altera_Forum
Honored Contributor
9 years agoMemory replication OpenCL 16.1
I'm stumped by the needless replication of __local memory objects in a trivial OpenCL kernel. The local array has only 1rd and 1wr. I'm not sure what other directive will indicate the absence of multiple workgroups and prevent needless triplication of local memory.
Here's my simple kernel.
__attribute__((reqd_work_group_size(16384,1,1)))
__attribute__((max_work_group_size(16384)))
__kernel void test(__global int* restrict x, __global int* restrict y) {
__local int x_l[16384];
int i=get_global_id(0);
x_l[i] = x[i];
barrier(CLK_LOCAL_MEM_FENCE);
y[i] = x_l[i]*x_l[i];
}
Here's a snippet of the aocl report.html
- test.cl:4 (x_l):
- Local memory: Good but replicated.Requested size 65536 bytes (rounded up to nearest power of 2), implemented size 196608 bytes, replicated 3 times total, stall-free, 1 read and 1 write. Additional information:- Replicated 3 times to efficiently support multiple simultaneous workgroups. This replication resulted in 4 times increase in actual block RAM usage. Reducing the number of barriers or increasing max_work_group_size may help reduce this replication factor.