Forum Discussion

kkvasan's avatar
kkvasan
Icon for New Contributor rankNew Contributor
4 years ago

OneAPi: Iterative read write with swapped memory locations

Hi All,

I am using oneAPI to implement an application on Arria 10 GX acceleration card for my research work. There is a long kernel pipeline and input and output memory locations should be swapped for each iteration. Initially read and write loops were separate kernels but by that i can't synchronise the memory read and write for multiple iterations. Hence merged the read and write into one nested loop as follows.

        [[intel::max_concurrency(1)]]
        for(int itr = 0; itr < 2*n_iter; itr++){
          accessor ptrR1 = (itr & 1) == 0 ? in1 : out1;
          accessor ptrW1 = (itr & 1) == 1 ? in1 : out1;

          auto input_ptr = ptrR1.get_pointer();
          auto output_ptr = ptrW1.get_pointer();

          [[intel::initiation_interval(1)]]
          [[intel::ivdep]]
          [[intel::max_concurrency(0)]]
          for(int i = 0; i < total_itr; i++){
            vec1 = ptrR1[i];
            pipeS::PipeAt<idx1>::write(vec1);


            vecW1 = pipeS::PipeAt<idx2>::read();
            ptrW1[i] = vecW1; //vecW1;

          }

        }

This one works but i am getting reduced performance. around 8 times less bandwidth than expected. same inner loop without pipes, just copying data to write location gives expected performance. any suggestion/ advice to fix the performance issue is appreciated


Many Thanks,
Vasan

11 Replies