Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
15 years ago

Math functions (sine & cosine) latency

Can anyone provide the data about the latency for sine and cosine single precision functions from math.h lib on Cyclone III, possibly Cyclone II, with floating point hardware acceleration (hardware division included)? I need it for the purpose of my engineers thesis, I would be thankful for any help.

10 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The library is compiled with software floating point. So hardware FPU won't help any functions defined in math.h.

    I am also seeking ways to recompile the math functions with hardware FPU support.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    So hardware FPU won't help any functions defined in math.h.

    --- Quote End ---

    That's not true. In the Tutorial "Using Nios II Floating-Point Custom Instructions", you can find :

    --- Quote Start ---

    Table 1–2 indicates which math library functions use floating-point, and of those, which use floating-point division. If a function uses floating-point, it runs faster with floating-point hardware. If a function uses floating-point division, it runs even faster with floating-point division hardware.

    --- Quote End ---

    So the use of the floating point custom instruction affects the computation speed of the sine and cosine.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thinks ....

    Having written a soft-float package (in ARM assembler) it strikes me that some simple combinatorial custom instructions (possibly just 1 that uses the rB field to decide what to do) would speed up soft float somewhat!

    Likely candidates:

    - Extract exponent, detecting NaN and Infinity

    - Extract and normalise mantissa (with and without sign)

    - Count leading zeros and count leading ones

    There also needs to be an easy way of doing 'add with carry' and 64bit shifts (for normalising values).

    Something built that way would be significantly faster than full soft-float, but using much less fpga real estate than the current custom code.

    It would also let you write 'double' (and maybe long double - 64 bit mantissa) support.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    That's not true. In the Tutorial "Using Nios II Floating-Point Custom Instructions", you can find :

    So the use of the floating point custom instruction affects the computation speed of the sine and cosine.

    --- Quote End ---

    Then how can get the math library work with hardware FPU.

    I have tested sin/cos with/without hardware FPU. The performance is the same.

    Is there any configeration I need to set? Thanks.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    If you have integrated the custom FP instruction with the Nios II Processor (in the SOPC Builder), then you have no particular things to do to enable it, it is automatic.

    Personnaly I have used it for a project (for the computation of the arctan function) and I have seen a difference.

    Are you sure that you are using float variables and not double ?

    Check the "Floating-Point Instructions" part of the Nios II Processor Reference Handbook for the details, maybe you have missed something.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    It might be worth disassembling a test program and following the calls to see what code is actually generated/called.

    Probably the easiest way to determine if the custom instructions are actually being used!
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    If you have integrated the custom FP instruction with the Nios II Processor (in the SOPC Builder), then you have no particular things to do to enable it, it is automatic.

    Personnaly I have used it for a project (for the computation of the arctan function) and I have seen a difference.

    Are you sure that you are using float variables and not double ?

    Check the "Floating-Point Instructions" part of the Nios II Processor Reference Handbook for the details, maybe you have missed something.

    --- Quote End ---

    I have run the test code from Altera

    Only + - * work well with FPU. Divide doesn't work:mad:

    I did tick the optional divider.

    All math functions don't work too.

    I have tried sinf(). No difference.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I think i found the same problem as you, however, i've found a workaround.

    I am using the NIOS II/f core with custom floating point instructions (with also, hardware divide support) and Quartus/NIOSII EDS 10.0 in linux

    What i've found is that the linker always uses a generic library libm.a file (for cosf, sinf et al.), no matter what kind of hardware implementation you have.

    So what i made was to change the generic libm.a contained in altera/10.0/nios2eds/bin/gnu/H-i686-pc-linux-gnu/nios2-elf/lib file with the libm.a contained at altera/10.0/nios2eds/bin/nios2-gnutools/H-i686-pc-linux-gnu/nios2-elf/lib/mcustom-fpu-cfg=60-2

    This has made that a system that was operating at 200Hz jumped to 1100 Hz , so the gain in performance is considerable.

    You can always check the objdump to see in the assembler code generated if the code is calling mulsf3 or custom instructions.....
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I seem to recall the math library uses lookup tables so I don't think the FPUCI will help much (it's been a while I could be wrong....) If you get your hands on a dedicated sine/cosine hardware block and integrate it as a custom instruction you can pass these to the compiler to target them directly:

    -mcustom-fsins=<x> -mcustom-fcoss=<y>

    <x> and <y> are the custom instruction indexes you assigned. There are double precision equivalents but I would start off with single precision since there are other things you have to take care of to target double precision.