Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
9 years ago

Single multiplier takes up a whole DSP block for

Hello,

I'm using a Cyclone V SOC FPGA.

Currently my design has 8 multipliers (which I coded in VHDL instead of instantiating).

The inputs to the multipliers are 12 and 16 bits wide.

According to this document:

https://www.altera.com/content/dam/altera-www/global/en_us/pdfs/literature/wp/wp-01159-arriav-cyclonev-dsp.pdf

I expected the tool to pack 2 multipliers into a single DSP block - so that for 8 multipliers only 4 DSP blocks shall be consumed.

Unfortunately - the compilation report shows that 8 DSP blocks are consumed (one per each multiplier).

I tried to change the synthesis behavior to area driven - but nothing changed.

Any idea what can cause such behavior ?

3 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Can you show the VHDL code? Have you tried instantiating the multipliers from the IP Catalog instead of using code inference?

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    Have you tried instantiating the multipliers from the IP Catalog instead of using code inference?

    --- Quote End ---

    No.

    I preferred pure HDL since I want to parameterize the multiplier with generics during compilation.

    entity multiplier is
    generic 
    (
        LOCATION_FIRST_RESULT_BIT : natural ;
        WIDTH_A : positive  ;
        WIDTH_B : positive  ;
        WIDTH_RESULT  : positive  
    ) ;
    port 
    (
        IN_A :   in  std_logic_vector ( WIDTH_A - 1 downto 0 ) ;  
        IN_B :   in  std_logic_vector ( WIDTH_B - 1 downto 0 ) ;  
        
        OUT_RESULT :   out std_logic_vector ( WIDTH_RESULT - 1 downto 0 ) 
    ) ;
    end entity multiplier ;
    architecture rtl_multiplier of multiplier is 
    signal  signed_multiplier_result  : signed ( WIDTH_B + WIDTH_A - 1 downto 0 ) ;
    begin 
        signed_multiplier_result <=  signed ( IN_B ) * signed ( IN_A ) ;
        OUT_RESULT <=  std_logic_vector ( signed_multiplier_result ( WIDTH_RESULT + LOCATION_FIRST_RESULT_BIT - 1 downto LOCATION_FIRST_RESULT_BIT ) ) ; 
        
    end architecture rtl_multiplier ;
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    According to my observation, Quartus uses all available DSP block before it starts packing multipliers. See same-topic discussion at Edaboard

    http://www.edaboard.com/showthread.php?t=368754

    I managed to fill up all 25 DSP blocks of Cyclone5 A2 with this test

    library IEEE;
    use IEEE.STD_LOGIC_1164.ALL;
    use IEEE.NUMERIC_STD.ALL;
    entity test1 is
    generic(
     n : integer := 50;
     w : integer := 18
    );
    port(
    	clk	: in STD_LOGIC;
    	sel	: in integer range 0 to n-1;
    	ax	: in signed(w-1 downto 0);
    	bx	: in signed(w-1 downto 0);
    	cx	: out SIGNED(2*w-1 downto 0)
    );
    end test1;
    architecture rtl of test1 is
    type ar18 is array(0 to n-1) of signed(w-1 downto 0);
    type ar36 is array(0 to n-1) of signed(2*w-1 downto 0);
    signal ar : ar18;
    signal br : ar18;
    signal cr : ar36;
    begin
    process (clk)
    	begin
    		if rising_edge(clk) then
    			for i in 0 to n-1 loop
    				cr(i) <= ar(i)*br(i);
    				if i = sel then
    					ar(i) <= ax;
    					br(i) <= bx;
    					cx <= cr(i);
    				end if;
    			end loop;
    		end if;
    	end process;
    end rtl
    ;