Forum Discussion
Altera_Forum
Honored Contributor
15 years agoRead the NiosII instruction set :-)
Basically the read/write memory addressing modes are a 16bit signed offset from a register. One of the registers 'gp' (r26) is reserved to point into the middle of a 'small data' area, variables within this area are accessed using offsets from 'gp' - giving single instruction access. Anything <= 4 byes, and anything put into sections whose names start .sdata or .sbss is accessed as offset from 'gp'. To access variables that aren't in the small data segment, the compiler will generate a pointer to the data item (2 instructions), and then dereference it. It isn't possible to tell the compiler that low 16 bits of the constant could be moved to the load/store instruction - and in any case that would only be possible for accesses to non-aggregate items, and they end up in the small data segment. If a data item is used multiple times, the compiler will often keep it's address in a register (although I've seen it forget and load the value into a 2nd one!). So placing things in the .sdata area reduces register pressure as well as instruction count. There is a slight problem though - the code in gcc that is used for the 'small data' is really designed for processors that can access memory either side of address 0 with single instructions. As such it doesn't expect to be able to add in a constant offset. This shows up when you put an array into the small data area, when the compiler won't add the array index to 'gp' and then use the array base address (as offset from gp) as an offset from that result. If you put all your data into a structure, and use a global register variable to point to the start of it, then that optimisation will happen. Now with:volatile alt_u32 *uart_pointer;
uart_pointer = 0xDEADBEEF;The assigment requires the compiler generate code to read 'uart_pointer' and then write to the uart register - at least two instructions and two memory cycles (and possibly a two cycle stall waiting for the read unless the instructions can be reordered). On the other hand, directly indexing the the constant address ought to be 2 instructions (one to load the high register bits, the 2nd the uart access). I suspect it is difficult to get the compiler to not generate a 3 instruction sequence - especially without my patches! However, get the uart registers inside the area addressable from 'gp' and you get single instruction access.