Altera_Forum
Honored Contributor
11 years agoManual loop unrolling VS. using #pragma unroll
Hello,
I noticed something strange with the offline compiler. In the two codes below, the same operations are performed (since the# pragma directive unrolls the loops), except in the second version the accumulations are "packed" in a single operation (from a syntactic point of view). I found that the second version is more efficient in terms of estimated throughput and logic blocks, according to aoc. In the context of my kernel, I get +5% on the throughput and -2% of logic blocks used ; it's a complex kernel, I'm guessing the difference could be more significant on smaller kernels. I would have thought the compiler was able to unroll the MACs in the most efficient way... Am I missing something ? Thanks#pragma unroll 7
for(int j=-3; j<4; j++) {
# pragma unroll 7
for(int i=-3; i<4; i++) {
temp += L * coeffs;
}
} temp = L * coeffs + L * coeffs + L * coeffs
+ L * coeffs + L * coeffs + L * coeffs
+ L * coeffs + L * coeffs + L * coeffs
+ L * coeffs + L * coeffs + L * coeffs
+ L * coeffs + L * coeffs + L * coeffs
+ L * coeffs + L * coeffs + L * coeffs
+ L * coeffs + L * coeffs + L * coeffs
+ L * coeffs + L * coeffs + L * coeffs
+ L * coeffs + L * coeffs + L * coeffs
+ L * coeffs + L * coeffs + L * coeffs
+ L * coeffs + L * coeffs + L * coeffs
+ L * coeffs + L * coeffs + L * coeffs
+ L * coeffs + L * coeffs + L * coeffs
+ L * coeffs + L * coeffs + L * coeffs
+ L * coeffs + L * coeffs + L * coeffs
+ L * coeffs + L * coeffs + L * coeffs
+ L * coeffs; btw, badOmen if you read this : I can't reply to your PM about my other topic because I need 10 posts. Getting close to that :)