Forum Discussion
Altera_Forum
Honored Contributor
13 years ago-O3 is likely to generate faster code than -Os, whether or not it is smaller.
It is quite likely that the source code can be changed to significantly improve the performance. OTOH that is quite hard work on something as large as a TCP/IP stack. Typically it involves: 1) Stopping the compiler spilling registers to stack (may involve reducing the number of 'live' values in the code). 2) Assigning intermediate values to locals if they are used multiple times and a memory write could alias the source. 3) Forcing values be read from memory early to avoid pipeline stalls. 4) Put as much data (and io) where it can be referenced relative to %gp (reduces code size and pressure on registers) 5) Getting the static branch prediction right for every branch (and then disabling the dynamic branch predictor on the hidden menu). 6) Be willing to modify gcc. At a guess you can get 30%+ improvement - unless the code has already been treated that way!