Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
7 years ago

Reduce logic utilization

Hi,

I have this part in my kernel where it takes too much logic


if(relu == 1){ 
if(out < 0 )
      conv_in = 0.1*out;
else 
      conv_in = out;
 }

out is a float data. The report.html shows me it taking 4k aluts and 8k ff for this function which is too much for my de1soc to handle. Any idea how to reduce it?

Btw, the function is a leaky activation function where negative data will mutliply by 0.1.

Thanks in advance.

EDIT:

Whats the ups and downs in using these two compiler flag.

1) -fp-relaxed

2) -fpc

6 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Since floating-point operations are not natively supported by the DSPs in Cyclone V, for floating-point multiplication, multiplication of mantissa will use DSPs but all other operations including shifting (with barrel shifters) and rounding will use logic and FF. This is expected behavior and cannot be avoided unless you give up on IEEE-754-compliance.

    --fp-relaxed will allow parallelizing of floating-point operations in form of a tree that requires reordering of operations. This could slightly reduce the logic/FF overhead at the cost of small changes in the output. However, this might not necessarily make any difference in your kernel unless you have chained floating-point operations.

    --fpc can significantly reduce logic and FF overhead of floating-point operations by reducing the area spent on rounding functions, at the cost of losing compliance with the IEEE-754 standard; i.e. if you use that switch, you could get very different (i.e. inaccurate) results compared to running the same code on a CPU/GPU.

    Another option you have is to use fixed-point numbers. Altera's documents outline how you can use bit masking to convert floating-point numbers to fixed-point in an OpenCL kernel.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    jack12, try to replace "conv_in = 0.1*out" to "conv_in = 0.125*out" or "conv_in = 0.125*out - 0.03125*out" for more precision -- these expressions is easier.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The kernel is mainly doing floating point convolutions repeatedly. Anyway, i will try to verify my result and compare my result with the compiler flags on. Thanks HRZ

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi WitFed,

    I am trying to reduce the logic utilization, as its can not fit into FPGA design. I am confused why creating conv_in = 0.125*out - 0.03125*out will reduce the logic utilization? Shouldnt it be using more logic in subtractor ?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    because there is no 0.1 in hardware.

    if you use 0.1, compiler will use a lot of hardware to implement a number as close as 0.1,

    however if you use (0.125-0.03125)*out, it's like((1>>3)-(1>>5))*out