I believe I´m facing a similar problem.
The difference is that I have only two kernels. And they are called inside a for loop.
I´m freeing each vector after each use in the loop. However, it seems to have an upper limit of how much memory can I allocate in total.
for(int j = 0; j < REPEAT ; j++) {
...
...
...
void *rand_input = NULL;
posix_memalign(&rand_input, AOCL_ALIGNMENT,sizeof(float)*SIZE);
memcpy(rand_input, output, sizeof(float)*SIZE);
cl_mem input_buffer = clCreateBuffer(my_context,
CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR,
sizeof(float) * SIZE,
rand_input,
&status);
free(rand_input);
free(output);
....
....
}
This is the error message I get when launching the kernel.
Context callback: Could not allocate a buffer of the specified size due to fragmentation or exhaustion
Context callback: Could not map host buffers to device