Altera_Forum
Honored Contributor
10 years agoScaling up vector add example
I'm trying to scale up the vector add example to use more FPGA resources.
Kernel is unmodified:__kernel void vector_add(__global const float * restrict x,
__global const float * restrict y,
__global float *restrict z)
{
// get index of the work item
int index = get_global_id(0);
// add the vector elements
z = x + y;
}
I tried increasing the work group size: __attribute__((reqd_work_group_size(1024,1,1))) However, aoc reports the same device utilization regardless of the size I use. The optimization guide implies that by specifying a work group size, the compiler will attempt to compile the hardware for that work group size, which scales up the design. Is this true? Alternatively, I can vectorize or increase the number of compute units to scale up the design, but vectorization is limited to 16 (num_simd_work_items) and compute units seem to come with a lot of overhead. So: What does the reqd_work_group_size attribute do exactly? What's the best way to scale up a simple ND kernel like this?