Forum Discussion
Altera_Forum
Honored Contributor
15 years agoI got curious so I compiled the code you posted for both Nios and for AVR and compared the compiled code.
One thing I noticed that may account for the lack of expected performance is that Nios does not use any hardware acceleration for FP comparisons. For example, your inner loop does a comparison with the limit variable. On Nios this calls a subroutines totaling about 183 opcodes. (Less than this will actually execute due to the different possible branches taken, but this is a quick number for comparison purposes.) I also looked at the AVR code for the same comparison. If you link in the hand optimized assembler libm.a library, there are a total of about 40 opcodes. This is a lot more efficient than the Nios case. If you don't use libm.a but instead let GCC use it's own library (as is done with Nios), then the total opcodes is a little over 400. Are you using libm.a with your AVR tests? If you are just trying to compare performance of the architectures, a more fair test would be to not use libm.a with AVR. I have a feeling the lack of performance is mostly due to the poorly optimized FP operations for which Nios does not have hardware acceleration. I wonder if there is an easy way to accelerate the FP comparisons with Nios? It would probably make a big difference if you did this. Another similar inefficiency I noticed is the conversion of x, y and limit to floating point values. I suspect even changing the compare to limit to a compare to a constant value of 4 will make a small noticeable difference. The compiler is not optimizing the fact that limit is actually a constant.