Altera_Forum
Honored Contributor
8 years agoCalling EnqueuTask multiple times is not working.
I am running one program on my server with 32GB RAM. We have requirement to do matrix multiplication of 64x64x256 image with filter of the same size (64x64x256). Problem is that we have 2614600 such filter and image and filter pixel values are in float.
We have done something like as described to achieve our requirement but program gets killed without any error or warning. Not even segmentation fault. We monitored top command for the same and we noticed that virtual memory resource keeps increasing and at some threshold, program get killed. Note that this is pseudo code to cross check out requirement. Original code will not iterate up to 69206016. It will depend on some parameters. Note: filter number 2614600 does not matter as we may get more filter in future. Our main concern to run this loop forever. Currently our code terminates after around 32620 iteration. Theoretically it shall run forever. We just don't know what we are doing wrong. Can anyone help us on this? One more thing to know that we remove wait event in kernal then it is running but waiting is also our requirement.
// Command queue
fc_queue = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, &status);
checkError(status, "Failed to create command queue");
FC_input_buf = clCreateBuffer(context, CL_MEM_READ_ONLY, 64x64x256 * sizeof(float), NULL, &status);
checkError(status, "Failed to create buffer for input image");
FC_output_buf = clCreateBuffer(context, CL_MEM_READ_ONLY, 1024 * sizeof(float), NULL, &status);
checkError(status, "Failed to create buffer for input image");
status = clEnqueueWriteBuffer(fc_queue, FC_input_buf, CL_TRUE, 0, 64x64x256 * sizeof(float), input_image, 0, NULL, NULL);
checkError(status, "Failed to transfer input.");
for(unsigned int ff=0; ff< 2614600; ff++){ //// Number of filters //// // Buffer
FC_weight_buf = clCreateBuffer(context, CL_MEM_READ_ONLY, 64x64x256 * sizeof(float), NULL, &status);
checkError(status, "Failed to create buffer for FC_weights");
// NOTE we have taken FC_weights common for each iteration in this pseudo code. In original, it will be some offset of fc_weights based on iteration number. But size with the same for each iteration.
status = clEnqueueWriteBuffer(fc_queue, FC_weight_buf, CL_TRUE, 0, 64x64x256*sizeof(float), FC_weights, 0, NULL, NULL);
checkError(status, "Failed to transfer weights");
unsigned argi = 0;
status = clSetKernelArg(fc_kernal, argi++, sizeof(cl_mem), &FC_input_buf); checkError(status, "Failed to set argument %d", argi - 1);
status = clSetKernelArg(fc_kernal, argi++, sizeof(cl_mem), &FC_weight_buf); checkError(status, "Failed to set argument %d", argi - 1);
status = clSetKernelArg(fc_kernal, argi++, sizeof(cl_mem), &FC_output_buf); checkError(status, "Failed to set argument %d", argi - 1);
cl_event kernel_event = NULL;
status = clEnqueueTask(fc_queue, fc_kernal, 0, NULL, &kernel_event); checkError(status, "Failed to launch kernel");
// NOTE we have to wait until this kernel finish its execution. We can also use NDRange kernal here. I am getting the same problem in NDrange kernal also.
clWaitForEvents(1, &kernel_event);
clReleaseMemObject(FC_weight_buf);
FC_weight_buf = NULL;
}
clReleaseMemObject(FC_input_buf);
clReleaseMemObject(FC_output_buf);
clReleaseCommandQueue(fc_queue);