Altera_Forum
Honored Contributor
8 years agoPossible Bug with clWaitForEvents
Hi,
I'm trying to do performance testing on FPGA, and I'm using some timing in order to calculate the GFlops of a kernel. In order to make sure I'm waiting for the kernel completion, I use clWaitForEvents, on an event which is being created before enqueuing the kernel. Here is the piece of code I use to enqueue the kernel: Event evKernel (algorithm->getKernelName ());
err = clEnqueueNDRangeKernel (queue, kernel, algorithm->getWorkDim(),
NULL,
globalWorkSize,
localMemSize,
0, NULL, &evKernel.CLEvent());
CL_CHECK_ERROR (err);
err = clWaitForEvents (1, &evKernel.CLEvent());
evKernel.FillTimingInfo ();
evKernel is a wrapper aroud clEvent object, and the filltiminginfo function works as below: void Event::FillTimingInfo(const int idx)
{
int sidx, eidx;
if (idx == ALL_EVENTS) {
sidx = 0; eidx = count-1;
} else
sidx = eidx = idx;
for (int i=sidx ; i<=eidx ; ++i) {
cl_int err;
err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_QUEUED,
sizeof(cl_ulong), &queuedTime, NULL);
CL_CHECK_ERROR(err);
err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_SUBMIT,
sizeof(cl_ulong), &submitTime, NULL);
CL_CHECK_ERROR(err);
err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START,
sizeof(cl_ulong), &startTime, NULL);
CL_CHECK_ERROR(err);
err = clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END,
sizeof(cl_ulong), &endTime, NULL);
CL_CHECK_ERROR(err);
}
}
As you can see, I'm using the clgeteventprofilinginfo in order to get the correct numbers. For a long time, using this method was giving me non-reasonable performance numbers, which was going above the theoretical performance cap of the FPGA (1.5TFlops). So I got suspicious about the way we do timing and decided to do my own timing, by timestamping before clenqueuendrangekernel,and after the clwaitforevents. When I do that and calculate the difference, it's around 1 milliseconds, while the evkernel givesMuch higher value. As a result, I feel like there is something wrong with event management with Altera OpenCL host API. And I think two issues: 1) clWaitForEvents does not really wait for kernel completion. 2) clGetEventProfilingInfo, does not necessarily reflect the correct value. Any thought on this? Just for your information, I was working on this for a long time and always wondering something wrong with my GFlops computation. but it turns out the problem is with timing. Thanks, Saman