Forum Discussion
Hi @Mickleman,
I am not sure but I assume that is the way you get you i and j out of the division and the modulo.
I imagine you rewrite as follow (that do basically the same, without branching)
uint16_t* c=(uint16_t*)a
bool test=(k<GROUP_SIZE) ;
int i=k / GROUP_SIZE;
c[k]= (c[k-1]+i)*(!test)+(test)* (k-i*GROUP_SIZE);
you can get that the compiler don't see the opportunity over the GROUP_SIZE parallelization.
I think to get it automatically you should permute you array, a[j][i] ==> a[i][j] so the dependency inside look like c[k-GROUP_SIZE].
I don't know if I am clear in what I mean
Thanks @MGRAV
I'm not sure I quite understand what you are suggesting. If you could give me a version of the example changed as you propose I will try it out and feed back the results.
Regards
- MGRAV5 years ago
New Contributor
maybe something like this, and even I don't know if the compiler can figure out
const int ITEM_LENGTH = 10000;
const int GROUP_SIZE = 10;
uint16_t a[ITEM_LENGTH][GROUP_SIZE];
uint16_t b[ITEM_LENGTH][GROUP_SIZE];[[intel::ivdep]]
for (int k = 0; k < GROUP_SIZE * ITEM_LENGTH; k++)
{
int i = k / GROUP_SIZE;
int j = k - i * GROUP_SIZE;if ( i == 0 )
a[i][j] = j;
else
a[i][j] = a[i-1][j] + i;
}[[intel::ivdep]]
for (int k = 0; k < GROUP_SIZE * ITEM_LENGTH; k++)
{
int i = k / GROUP_SIZE;
int j = k - i * GROUP_SIZE;if ( i == 0 )
b[i][j] = j;
else
b[i][j] = b[i-1][j] + i;
}But what the problem with the two loops if it work ?