Altera_Forum
Honored Contributor
9 years agoClEnqueueReadBuffer performance/correctness
I have a micro-benchmark in OpenCL where i am writing to individual locations in a large buffer, calling a kernel and then reading the locations.
for(int n=0; n<num_threads; ++n){
int res=0;
err_code = clEnqueueWriteBuffer(queue, dec_data, CL_TRUE, n*sizeof(int), sizeof(int), &res, 0, nullptr, nullptr);
CHECK_CL_ERROR(err_code, "Write int failed.");
}
The kernel is minimal (local size is 256):
__kernel void initialize_memory(__global int * location){
location=get_global_id(0);
}
There are two issues: 1) First, the write/read performance in very poor : 256 items : WriteTime, 0.00138866,s, ReadTime, 0.00750306,s, 512 items : WriteTime, 0.0489417,s, ReadTime, 0.0965984,s, 1024 items : WriteTime, 0.345966,s, ReadTime, 0.550551,s, 2048 items : WriteTime, 1.73063,s, ReadTime, 2.70165,s, 4096 items : WriteTime, 9.88546,s, ReadTime, 18.5302,s, 8192 items : WriteTime, 97.7624,s, ReadTime, 133.822,s, 16384 items : WriteTime, 403.231,s, ReadTime, 646.002,s, 2) The second issue is that the verification fails for 8192 threads or more - all values read are zeros. Is there any failure that is going undetected in the read/write calls? I am running this on an Nallatech Altera 5SGXMA7H2F35C2 attached to an IBM Power8 machine. The kernel was compiled with Altera SDK for OpenCL, 64-Bit Offline Compiler (Version 15.0.0 Build 145). Any help is appreciated. Thanks,