"FPGA multiplication custom instructions" sounds like the integer hardware multiplier support in the NIOS processor itself.
Floating point support is accomplished separately from that. You need to include the floating piont custom instruction hardware in your Qsys/SOPC Builder project and connect it to your NIOS (and regenerate your BSP and recompile everything).
In other words, it sounds like your software is currently using software emulation of floating point operations.
http://www.altera.com/literature/tt/tt_floating_point_custom_instructions.pdf I believe floating point multiply / add operations take around (6) clocks when done in hardware (ALTFP_MULT megafunction etc.)