In the snippet below, I get the errors if alt_64 is used for t2, but no errors if alt_32 is used or optimization is turned off.
For explanation:
The problematic original code simply does VELOCITY+=ACCELERATION. The code below is a temporary workaround.
The ACCELERATION is always small enough (positive or negative, less than 0x1000 and controlled externally) so that VELOCITY never should exceed +- (2^31)-1, therefore the manual computation of VELOCITY_HI is valid.
Actually, when the breakpoint at the end is hit, it is t1 (held in registers) where the upper word is wrong, not VELOCITY (in DPRAM).
You might be tempted to comment on the "DPRAM" definition; yes, data is in a dual-ported RAM, and another CPU accesses the same area while this code runs in a loop. The other CPU regularly writes ACCELERATION and only
reads VELOCITY. Both CPUs have 32-bit-access to the DPRAM.
#define VELOCITY (*(volatile alt_64 *)((void *)DPRAM_BASE+8))# define VELOCITY_LO (*(volatile alt_32 *)((void *)DPRAM_BASE+8))# define VELOCITY_HI (*(volatile alt_32 *)((void *)DPRAM_BASE+12))# define ACCELERATION (*(volatile alt_32 *)((void *)DPRAM_BASE+16))
...
while(1) {
...
alt_64 t2; /* it works if you use alt_32 here! */
alt_64 t1;
...
t2 = ACCELERATION;
t1 = VELOCITY + t2;
VELOCITY_LO += t2;
VELOCITY_HI = (VELOCITY_LO >= 0) ? 0 : -1;
/* Don't continue if error occured: good place for a breakpoint */
while(VELOCITY != t1)
asm volatile("nop");
...
} /* while(1) */
The following is the resulting code for the working version with "alt_32 t2" on the left and the erratic code with "alt_64 t2" on the right (only the part that matches the above snippet). In every other place the resulting binaries are
exactly the same.
DPRAM_BASE is 0x80180, so
- VELOCITY(_LO) is 0x80188 (0x80000+392) and
- VELOCITY_HI is 0x80188 (0x80000+396) and
- ACCELERATION is 0x80190 (0x80000+400)
e4: movhi r4,8 | e4: movhi r6,8
e8: addi r4,r4,400 | e8: addi r6,r6,400
ec: ldw r9,0(r4) | ec: ldw r8,0(r6)
f0: ldw r2,0(r17) | f0: ldw r4,0(r6)
f4: ldw r3,4(r17) | f4: ldw r2,0(r17)
f8: ldw r8,0(r17) | f8: srai r5,r8,31
fc: mov r6,r9 | fc: ldw r3,4(r17)
100: srai r7,r9,31 | 100: ldw r8,0(r17)
104: add r8,r8,r9 | 104: add r6,r2,r4
108: stw r8,0(r17) | 108: movhi r10,8
10c: ldw r9,0(r17) | 10c: addi r10,r10,392
110: add r4,r2,r6 | 110: add r8,r8,r4
114: cmpltu r8,r4,r2 | 114: stw r8,0(r17)
118: cmplt r9,r9,zero | 118: ldw r9,0(r17)
11c: movhi r2,8 | 11c: cmpltu r8,r6,r2
120: addi r2,r2,396 | 120: movhi r2,8
124: sub r9,zero,r9 | 124: addi r2,r2,396
128: movhi r10,8 | 128: cmplt r9,r9,zero
12c: addi r10,r10,392 | 12c: sub r9,zero,r9
130: stw r9,0(r2) 130: stw r9,0(r2)
134: ldw r2,0(r10) 134: ldw r2,0(r10)
138: add r5,r3,r7 | 138: add r7,r3,r5
13c: add r8,r8,r5 | 13c: add r8,r8,r7
140: mov r6,r4 | 140: mov r3,r6
144: mov r7,r8 | 144: mov r4,r8
148: beq r2,r4,248 | 148: beq r2,r6,248 <alt_main+0x248>
14c: mov r3,r10 | 14c: mov r5,r10
150: nop 150: nop
154: ldw r2,0(r3) | 154: ldw r2,0(r5)
158: bne r2,r6,150 | 158: bne r2,r3,150 <alt_main+0x150>
15c: ldw r2,4(r3) | 15c: ldw r2,4(r5)
160: bne r2,r7,150 | 160: bne r2,r4,150 <alt_main+0x150>
Thanks for looking at the problem!
Kolja