Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
10 years ago

Arria10 (A10) DSP block mapping problem

Dear all,

I compiled the same RTL-design for both StratixV ((5SGXEA7) and Arria10 (10AX115N3), however, DSP blocks are differently used for the same VHDL description of floating-point multiplier. The FP multiplier is generated by the open-source tool "flopoco".

The 24x24 48-bit significand multiplication is mapped into a 36x36 DSP block for StratixV while it is mapped into a single 18x18 DSP block and logic cells of addition for Arria10.

Quartus II ver 15.1 with the same compilation options is used for both two FPGAs.

As a result, timing violation occurs due to the adder logic cells for Arria10 with slack=-1.8 at 180 MHz.

The multiplication with a 36x36 DSP on StratixV has shorter delay and no timing violation at 180 MHz.

Are there any way to solve this problem?

Is this a bug of QuartusII 15.1?

The VHDL code is shown below.

-- Beginning of code generated by BitHeap::generateCompressorVHDL

-- code generated by BitHeap::generateSupertileVHDL()

----------------Synchro barrier, entering cycle 0----------------

DSP_bh9_ch0_0 <= ("" & XX_m8(23 downto 0) & "00000000000") * ("" & YY_m8(23 downto 0) & "00000000000");

heap_bh9_w47_0 <= DSP_bh9_ch0_0(69); -- cycle= 0 cp= 2.905e-09

heap_bh9_w46_0 <= DSP_bh9_ch0_0(68); -- cycle= 0 cp= 2.905e-09

heap_bh9_w45_0 <= DSP_bh9_ch0_0(67); -- cycle= 0 cp= 2.905e-09

heap_bh9_w44_0 <= DSP_bh9_ch0_0(66); -- cycle= 0 cp= 2.905e-09

heap_bh9_w43_0 <= DSP_bh9_ch0_0(65); -- cycle= 0 cp= 2.905e-09

heap_bh9_w42_0 <= DSP_bh9_ch0_0(64); -- cycle= 0 cp= 2.905e-09

heap_bh9_w41_0 <= DSP_bh9_ch0_0(63); -- cycle= 0 cp= 2.905e-09

heap_bh9_w40_0 <= DSP_bh9_ch0_0(62); -- cycle= 0 cp= 2.905e-09

heap_bh9_w39_0 <= DSP_bh9_ch0_0(61); -- cycle= 0 cp= 2.905e-09

heap_bh9_w38_0 <= DSP_bh9_ch0_0(60); -- cycle= 0 cp= 2.905e-09

heap_bh9_w37_0 <= DSP_bh9_ch0_0(59); -- cycle= 0 cp= 2.905e-09

heap_bh9_w36_0 <= DSP_bh9_ch0_0(58); -- cycle= 0 cp= 2.905e-09

heap_bh9_w35_0 <= DSP_bh9_ch0_0(57); -- cycle= 0 cp= 2.905e-09

heap_bh9_w34_0 <= DSP_bh9_ch0_0(56); -- cycle= 0 cp= 2.905e-09

heap_bh9_w33_0 <= DSP_bh9_ch0_0(55); -- cycle= 0 cp= 2.905e-09

heap_bh9_w32_0 <= DSP_bh9_ch0_0(54); -- cycle= 0 cp= 2.905e-09

heap_bh9_w31_0 <= DSP_bh9_ch0_0(53); -- cycle= 0 cp= 2.905e-09

heap_bh9_w30_0 <= DSP_bh9_ch0_0(52); -- cycle= 0 cp= 2.905e-09

heap_bh9_w29_0 <= DSP_bh9_ch0_0(51); -- cycle= 0 cp= 2.905e-09

heap_bh9_w28_0 <= DSP_bh9_ch0_0(50); -- cycle= 0 cp= 2.905e-09

heap_bh9_w27_0 <= DSP_bh9_ch0_0(49); -- cycle= 0 cp= 2.905e-09

heap_bh9_w26_0 <= DSP_bh9_ch0_0(48); -- cycle= 0 cp= 2.905e-09

heap_bh9_w25_0 <= DSP_bh9_ch0_0(47); -- cycle= 0 cp= 2.905e-09

heap_bh9_w24_0 <= DSP_bh9_ch0_0(46); -- cycle= 0 cp= 2.905e-09

heap_bh9_w23_0 <= DSP_bh9_ch0_0(45); -- cycle= 0 cp= 2.905e-09

heap_bh9_w22_0 <= DSP_bh9_ch0_0(44); -- cycle= 0 cp= 2.905e-09

heap_bh9_w21_0 <= DSP_bh9_ch0_0(43); -- cycle= 0 cp= 2.905e-09

heap_bh9_w20_0 <= DSP_bh9_ch0_0(42); -- cycle= 0 cp= 2.905e-09

heap_bh9_w19_0 <= DSP_bh9_ch0_0(41); -- cycle= 0 cp= 2.905e-09

heap_bh9_w18_0 <= DSP_bh9_ch0_0(40); -- cycle= 0 cp= 2.905e-09

heap_bh9_w17_0 <= DSP_bh9_ch0_0(39); -- cycle= 0 cp= 2.905e-09

heap_bh9_w16_0 <= DSP_bh9_ch0_0(38); -- cycle= 0 cp= 2.905e-09

heap_bh9_w15_0 <= DSP_bh9_ch0_0(37); -- cycle= 0 cp= 2.905e-09

heap_bh9_w14_0 <= DSP_bh9_ch0_0(36); -- cycle= 0 cp= 2.905e-09

heap_bh9_w13_0 <= DSP_bh9_ch0_0(35); -- cycle= 0 cp= 2.905e-09

heap_bh9_w12_0 <= DSP_bh9_ch0_0(34); -- cycle= 0 cp= 2.905e-09

heap_bh9_w11_0 <= DSP_bh9_ch0_0(33); -- cycle= 0 cp= 2.905e-09

heap_bh9_w10_0 <= DSP_bh9_ch0_0(32); -- cycle= 0 cp= 2.905e-09

heap_bh9_w9_0 <= DSP_bh9_ch0_0(31); -- cycle= 0 cp= 2.905e-09

heap_bh9_w8_0 <= DSP_bh9_ch0_0(30); -- cycle= 0 cp= 2.905e-09

heap_bh9_w7_0 <= DSP_bh9_ch0_0(29); -- cycle= 0 cp= 2.905e-09

heap_bh9_w6_0 <= DSP_bh9_ch0_0(28); -- cycle= 0 cp= 2.905e-09

heap_bh9_w5_0 <= DSP_bh9_ch0_0(27); -- cycle= 0 cp= 2.905e-09

heap_bh9_w4_0 <= DSP_bh9_ch0_0(26); -- cycle= 0 cp= 2.905e-09

heap_bh9_w3_0 <= DSP_bh9_ch0_0(25); -- cycle= 0 cp= 2.905e-09

heap_bh9_w2_0 <= DSP_bh9_ch0_0(24); -- cycle= 0 cp= 2.905e-09

heap_bh9_w1_0 <= DSP_bh9_ch0_0(23); -- cycle= 0 cp= 2.905e-09

heap_bh9_w0_0 <= DSP_bh9_ch0_0(22); -- cycle= 0 cp= 2.905e-09

----------------Synchro barrier, entering cycle 0----------------

Mickycat

2 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The DSP blocks in Stratix V and Arria 10 are fundamentally different. Whilst Stratix V supports 36 x 36 multipliers, by using four 18 x 18 multipliers from two adjacent DSP blocks, Arria 10 does not support a 36 x 36 mode. It does discuss 27 x 27 multiplication, but only for fixed point.

    Refer to the 'Variable Precision DSP Blocks' section of the Handbook for each device family and look at the 'Operational Mode' - table 1.

    supported operational modes in stratix v devices (https://documentation.altera.com/#/00005262-aa$nt00072904)

    supported operational modes in arria® 10 devices (https://documentation.altera.com/#/00045071-aa$aa00044854)

    Do you need back to back multiplication, every clock cycle? I can only suggest you can consider performing the same calculation over multiple clock cycles. I doubt a device speed grade change will mop up that much slack - I hope you've tried that.

    Cheers,

    Alex
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi Alex,

    Thank you for your comments.

    I know that Arria10 has no support for 36x36 (integer) DSP mode.

    So I am trying to reduce the bit width of the multiplication so that it fit 27x27 DSPs.

    P.S. In the near future, I will also try to use Floating-point DSPs of Arria10 for our scientific simulation application.

    Thanks a lot,

    Mickycat