loop-unrolling and memory access performance

Honored Contributor

7 years ago

thanks for your reply,

Actually my inner loop has a large and consecutive memory access and as reported by optimizer, it is pipelined well with II=1.

I put# pragma unroll 4 on the outer loop (not inner one as you did), hoping to have 4 parallel accesses using 4 memory ports, because outer loop body is independent in different iterations (no read after write). Area size increased by 4 (both logic and BRAMs which I think BRAMs are used as cache for global memory), then I guess there exist 4 memory ports replicated. But performance does not change.

Do you have any guess? my guess is somehow memory accesses are done serially. not in parallel.

# pragma unroll 4

for (unsigned i = 0;i < 4000000; i++)

{

acc = 0.0;

si = start_index;

ei = end_index;

for(unsigned j = si;j < ei;++j) //pipelined with II=1

acc += value[j]; // target memory access

value_next[i] = acc ;

}

Forum Discussion

loop-unrolling and memory access performance

Recent Discussions

Tensor block usage

When you double click on a word, the other instances do not highlight due to the Find Box being open

jtagserver.exe causing BSOD together with ftdi driver

Automatically added negative node for TDS output doesn't work with Agilex 5

Agilex3 - unknown IDCODE