Forum Discussion
Altera_Forum
Honored Contributor
15 years agoI would look at the generated code, the instruction set is fairly simple to understand.
To get any performance you need to ensure everything in compiler with -O2 or -O3 - these will also make the generated code easier to understand. Use 'gcc -S -fverbose-asm -O3 -o foo.S foo.c'. If you are processing the data sequentially then a small data cache (with 32 byte lines) should improve things. If you are indexing the same offsets in each array, check the cache associativity (I think it doesn't have any!) - so you may want to ensure the three arrays are offset from each other so that the same index uses different cache lines. My system has 16Mb of SDRAM for buffers, but since these are accessed randomly (one byte from each buffer) I don't use the data cache at all.