Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
9 years ago

ClEnqueueReadBuffer performance/correctness

I have a micro-benchmark in OpenCL where i am writing to individual locations in a large buffer, calling a kernel and then reading the locations.

 
for(int n=0; n<num_threads; ++n){
   int res=0;
   err_code = clEnqueueWriteBuffer(queue, dec_data, CL_TRUE, n*sizeof(int), sizeof(int), &res, 0, nullptr, nullptr);
   CHECK_CL_ERROR(err_code, "Write int failed.");
}
The kernel is minimal (local size is 256):


__kernel void initialize_memory(__global int * location){
      location=get_global_id(0);
}

There are two issues:

1) First, the write/read performance in very poor :

256 items : WriteTime, 0.00138866,s, ReadTime, 0.00750306,s,

512 items : WriteTime, 0.0489417,s, ReadTime, 0.0965984,s,

1024 items : WriteTime, 0.345966,s, ReadTime, 0.550551,s,

2048 items : WriteTime, 1.73063,s, ReadTime, 2.70165,s,

4096 items : WriteTime, 9.88546,s, ReadTime, 18.5302,s,

8192 items : WriteTime, 97.7624,s, ReadTime, 133.822,s,

16384 items : WriteTime, 403.231,s, ReadTime, 646.002,s,

2) The second issue is that the verification fails for 8192 threads or more - all values read are zeros. Is there any failure that is going undetected in the read/write calls?

I am running this on an Nallatech Altera 5SGXMA7H2F35C2 attached to an IBM Power8 machine. The kernel was compiled with Altera SDK for OpenCL, 64-Bit Offline Compiler (Version 15.0.0 Build 145).

Any help is appreciated.

Thanks,
No RepliesBe the first to reply