Forum Discussion
Altera_Forum
Honored Contributor
14 years agoI don't know what the numbers should be for a DE2 board but I'll explain how the custom instructions work and how other things play a role.
So the floating point custom instructions perform +, -, *, and optional /. Any time the Nios II processor executes the custom instruction ... instruction it can potentially become a blocking operation. So if a custom instruction took 6 clock cycles to complete then the processor pipeline stalls for 6 clock cycles waiting for the result from the custom instruction. I forget how the tutorial software is written but I assume it performs a series of floating point operations over an array of data inside a loop (one floating point operator per loop). So if all the data is cached this should be fairly quick, if not then the memory access times will play a role in the inefficiencies for the floating point operator. Another thing that will play a significant role is the optimization level of the compiler. Since I don't know much about the system you are running the FPU tutorial on my hunch is that there is a lot of latency between the processor and the memory that is used to store the code. If the main memory is on a different clock domain then I would expect the inefficiencies that you are seeing.