kkvasan
New Contributor
4 years agoOneAPi: Iterative read write with swapped memory locations
Hi All,
I am using oneAPI to implement an application on Arria 10 GX acceleration card for my research work. There is a long kernel pipeline and input and output memory locations should be swapped for each iteration. Initially read and write loops were separate kernels but by that i can't synchronise the memory read and write for multiple iterations. Hence merged the read and write into one nested loop as follows.
[[intel::max_concurrency(1)]] for(int itr = 0; itr < 2*n_iter; itr++){ accessor ptrR1 = (itr & 1) == 0 ? in1 : out1; accessor ptrW1 = (itr & 1) == 1 ? in1 : out1; auto input_ptr = ptrR1.get_pointer(); auto output_ptr = ptrW1.get_pointer(); [[intel::initiation_interval(1)]] [[intel::ivdep]] [[intel::max_concurrency(0)]] for(int i = 0; i < total_itr; i++){ vec1 = ptrR1[i]; pipeS::PipeAt<idx1>::write(vec1); vecW1 = pipeS::PipeAt<idx2>::read(); ptrW1[i] = vecW1; //vecW1; } }
This one works but i am getting reduced performance. around 8 times less bandwidth than expected. same inner loop without pipes, just copying data to write location gives expected performance. any suggestion/ advice to fix the performance issue is appreciated
Many Thanks,
Vasan