tightly-coupled memory performance !!

Question

I wanted to compare the performance between cache and tightly-coupled memory. So I did the following experiment:

tic

call the function()

toc

tic

call the same function()

toc

I noticed that the second run was slightly faster than the first run.

All code + data are placed in the on-chip tightly-coupled memory, even the stack.

Can anyone comment on this behavior ?

altera_forum · Answer

I would look at the assembled code (objdump file) to see what the compiler is doing.  I'm guessing all the register preserving operations for the first call are not being duplicated for the second call and as a result the second call is faster.  This would have nothing to do with tightly coupled memory, it's just a code optimization.

altera_forum · Answer

The calling function is very simple. And it is the called function that has the reading loop. The called function should be the same for each call, right?

altera_forum · Answer

The called function assembly code

05009254 <corner_turn_main>:

5009254: 2015883a mov r10,r4

5009258: 0013883a mov r9,zero

500925c: 00001506 br 50092b4 <corner_turn_main+0x60>

5009260: 40800017 ldw r2,0(r8)

5009264: 3885883a add r2,r7,r2

5009268: 11000017 ldw r4,0(r2)

500926c: 01400784 movi r5,30

5009270: 3145383a mul r2,r6,r5

5009274: 1245883a add r2,r2,r9

5009278: 1085883a add r2,r2,r2

500927c: 1085883a add r2,r2,r2

5009280: 00c14474 movhi r3,1297

5009284: 18e7c604 addi r3,r3,-24808

5009288: 10c5883a add r2,r2,r3

500928c: 11000015 stw r4,0(r2)

5009290: 00c00044 movi r3,1

5009294: 30cd883a add r6,r6,r3

5009298: 00800104 movi r2,4

500929c: 388f883a add r7,r7,r2

50092a0: 317fef1e bne r6,r5,5009260 <corner_turn_main+0xc>

50092a4: 48d3883a add r9,r9,r3

50092a8: 5095883a add r10,r10,r2

50092ac: 00800a04 movi r2,40

50092b0: 48800426 beq r9,r2,50092c4 <corner_turn_main+0x70>

50092b4: 5011883a mov r8,r10

50092b8: 000d883a mov r6,zero

50092bc: 000f883a mov r7,zero

50092c0: 003fe706 br 5009260 <corner_turn_main+0xc>

50092c4: f800283a ret

The calling function assembly code:

5008cc8: 04010034 movhi r16,1024

5008ccc: 84062804 addi r16,r16,6304

5008cd0: 84800037 ldwio r18,0(r16)

5008cd4: a009883a mov r4,r20

5008cd8: 50092540 call 5009254 <corner_turn_main>

5008cdc: 84400037 ldwio r17,0(r16)

5008ce0: 8ca3c83a sub r17,r17,r18

5008ce4: 04c000b4 movhi r19,2

5008ce8: 9cc7af04 addi r19,r19,7868

5008cec: 9809883a mov r4,r19

5008cf0: 880b883a mov r5,r17

5008cf4: 000feb00 call feb0 <printf>

5008cf8: 84800037 ldwio r18,0(r16)

5008cfc: a009883a mov r4,r20

5008d00: 50092540 call 5009254 <corner_turn_main>

5008d04: 84000037 ldwio r16,0(r16)

5008d08: 84a1c83a sub r16,r16,r18

5008d0c: 9809883a mov r4,r19

5008d10: 800b883a mov r5,r16

5008d14: 000feb00 call feb0 <printf>

altera_forum · Answer

Actually I just mean look at the "calling function assembly code" to see if some of the instructions before the 'call' are omitted for the second one.

Assuming all the instructions and data in the called function are all located in the tightly coupled memory, and if the code executes the same instructions between calls then it should have the same execution time. But that doesn't mean the stack operations leading into the call to the function will be the same between two calls. So you should look at the instructions before the 'call' to make sure you are simply not seeing additional work being done for the first call that is omitted for the second one.

altera_forum · Answer

Thank you BadOmen,

I understand what you are saying. However, I still find it strange that I find this difference. The compiler did not do anything for the second call, and the the difference is increasing as the loop iteration number increases!!!!!:confused:

I need somebody help me understand this issue.

Forum Discussion

tightly-coupled memory performance !!

9 Replies

Recent Discussions

Multiple NIOS V Implementation

not able to use multiple niosV cores at the same time

Nios V/m JTAG run‑control HALT fails — Debug Module healthy, hart never halts

SysID Timestamp

Implementing many Nios® V cores on Agilex™ 7