Forum Discussion

kkvasan's avatar
kkvasan
Icon for New Contributor rankNew Contributor
4 years ago

OneAPi: Iterative read write with swapped memory locations

Hi All,

I am using oneAPI to implement an application on Arria 10 GX acceleration card for my research work. There is a long kernel pipeline and input and output memory locations should be swapped for each iteration. Initially read and write loops were separate kernels but by that i can't synchronise the memory read and write for multiple iterations. Hence merged the read and write into one nested loop as follows.

        [[intel::max_concurrency(1)]]
        for(int itr = 0; itr < 2*n_iter; itr++){
          accessor ptrR1 = (itr & 1) == 0 ? in1 : out1;
          accessor ptrW1 = (itr & 1) == 1 ? in1 : out1;

          auto input_ptr = ptrR1.get_pointer();
          auto output_ptr = ptrW1.get_pointer();

          [[intel::initiation_interval(1)]]
          [[intel::ivdep]]
          [[intel::max_concurrency(0)]]
          for(int i = 0; i < total_itr; i++){
            vec1 = ptrR1[i];
            pipeS::PipeAt<idx1>::write(vec1);


            vecW1 = pipeS::PipeAt<idx2>::read();
            ptrW1[i] = vecW1; //vecW1;

          }

        }

This one works but i am getting reduced performance. around 8 times less bandwidth than expected. same inner loop without pipes, just copying data to write location gives expected performance. any suggestion/ advice to fix the performance issue is appreciated


Many Thanks,
Vasan

11 Replies

  • Hi @kkvasan,

    Great! Good to know that you are able to proceed as need, with no further clarification on this thread, it will be transitioned to community support for further help on doubts in this thread, where we will no longer monitor this thread.
    Thank you for the questions and as always pleasure having you here.

    Best Wishes
    BB