Forum Discussion
Can you post a snippet that can be compiled? The current one doesn't compile due to missing definitions. Since the two inner for loops are fully unrolled, I assume the high II is for the while loop, but you have not used #prgam ivdep for that loop which means the compiler will not ignore memory dependencies for that loop.
It is worth mentioning that if you are sure the addresses do not overlap, you should be able to modify the addressing so that indirect addressing can be avoided. Your algorithm is going to perform very poorly since accesses to the C buffer cannot be coalesced due to indirect addressing.
- NSriv27 years ago
New Contributor
Hi,
I also tried #pragma ivdep for while loop which didn't work. However, replacing while loop with a for (count = 0; count < num_elems; ) loop and then applying #pragma ivdep worked.
I am sure that the addresses do not overlap since each address is unique. However, it is not possible to statically determine the addresses since they depend on some task scheduling algorithm. Do you think the performance degradation will be a lot? Is there a way I can improve this situation?