Forum Discussion
Altera_Forum
Honored Contributor
14 years agoI'm not sure exactly what the Dhrystone benchmark does...
Given the effect of cache sizes, you really need a test that is much the same as the code you need to run - not some random benchmark. For maximum performance you may need to look carefully at the generated code: 1) Compile with -O2 or -O3. 2) Arrange that all memory accesses are either relative to %gp, or done relative to a global register variable (slightly better than %gp). 3) Avoid instruction stalls following memory reads. 4) Avoid mis-predicted branches, arrange to use the 'branch not taken' path if at all possible. 5) Avoid the compiler doing register spills to stack. 6) Consider using custom instructions for some operations. Some of the above are probably rather difficult if you are running any form of operating system! I removed almost all the spare instructions, unnecessary memory accesses and pipeline stalls from some code that does hdlc in software - I got that down to 149 clocks (max) per byte for rx and tx.