Forum Discussion

cjak's avatar
cjak
Icon for Occasional Contributor rankOccasional Contributor
29 days ago

Timing analysis - long combinational path

Hi,

Running Timing Analyzer I get violations due to long combinational paths. Looking at the path in the technology map viewer, it looks like this

leftmost block = registerbank holding a configurable value used by the other two modules

center/rightmost block = two identical modules using the register-value

I can see the long path, but I do not understand why it is implemented like this. Why is the register-value routed through dec_filter:15 to dec_filter:9, and not getting the value directly from the register-bank-module to the left?

Is there anything I can do to force a different implementation?

15 Replies

  • sstrell's avatar
    sstrell
    Icon for Super Contributor rankSuper Contributor

    Without seeing the code and just going by the names on the logic, you're performing a number of math operations and comparisons on the source signal, adding additional levels of logic.  Reviewing and adjusting the RTL code would probably be the easiest solution.  Post some code.

  • cjak's avatar
    cjak
    Icon for Occasional Contributor rankOccasional Contributor
      p_filter_fsm : process (clk) is
      begin
        if rising_edge(clk) then
          if (rst = '1') then
            coef_addr_cnt            <= i_config_filterlength_reg(coef_addr_cnt'range) - 1;
            decimation_cnt           <= (others => '0');
            accRAM_addr_in           <= i_config_q_factor_reg(accRAM_addr_in'range) - 1;
            prev_accRAM_addr_in      <= i_config_q_factor_reg(accRAM_addr_in'range) - 1;
            accRAM_addr_out          <= i_config_q_factor_reg(accRAM_addr_out'range) - 1;
            prev_accRAM_addr_out     <= i_config_q_factor_reg(accRAM_addr_out'range) - 1;
            run_filter_cnt           <= (others => '0');
            we_accRAM                <= '0';
            accRAM_Re_out_add        <= (others => '0');
            accRAM_Im_out_add        <= (others => '0');
            filtered_accRAM_addr_out <= (others => '0');
            o_filtered_sample.tvalid <= '0';
            o_filtered_sample.tlast  <= '0';
            state_filter             <= NEXT_SAMPLE;
          else
    
            -- Defaults
            accRAM_Re_out_add        <= accRAM_Re_out;
            accRAM_Im_out_add        <= accRAM_Im_out;
            we_accRAM                <= '0';
            o_filtered_sample.tdata(4*GC_CH_DATA_WIDTH-1 downto 0) <= (others => '0');
            o_filtered_sample.tvalid <= '0';
            o_filtered_sample.tlast  <= '0';
    
            -- FSM
            case state_filter is
              -------------------------------------------------------------------
              when NEXT_SAMPLE =>
                accRAM_addr_in  <= prev_accRAM_addr_in;
                accRAM_addr_out <= prev_accRAM_addr_out;
                run_filter_cnt  <= (others => '0');
                we_accRAM       <= '0';
    
                if (sample_tvalid_posedge = '1') then
                  sample_in      <= sample_s1.tdata(GC_CH_DATA_WIDTH-1 downto 0);
                  coef_addr_cnt  <= coef_addr_cnt - to_integer(i_config_d_factor_reg);
                  decimation_cnt <= decimation_cnt + 1;
                  state_filter   <= FILTERING;
                end if;
    
              -------------------------------------------------------------------
              when FILTERING =>
                mult_Re     <= std_logic_vector(resize(signed(sample_in) * signed(i_coef_data(2*GC_COEF_DATA_WIDTH-1 downto GC_COEF_DATA_WIDTH)), mult_Re'length));
                mult_Im     <= std_logic_vector(resize(signed(sample_in) * signed(i_coef_data(GC_COEF_DATA_WIDTH-1 downto 0)), mult_Im'length));
                mult_Re_ext <= std_logic_vector(resize(signed(mult_Re), mult_Re_ext'length));
                mult_Im_ext <= std_logic_vector(resize(signed(mult_Im), mult_Im_ext'length));
    
                -- Truncate number of bits defined in 'config_lsb_prod_reg'
                case i_config_lsb_prod_reg(2 downto 0) is
                  when b"000" =>
                    add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH-1 downto 0);  -- (35:0)
                    add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH-1 downto 0);
                  when b"001" =>
                    add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH downto 1);    -- (36:1)
                    add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH downto 1);
                  when b"010" =>
                    add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH+1 downto 2);  -- (37:2)
                    add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH+1 downto 2);
                  when b"011" =>
                    add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH+2 downto 3);  -- (38:3)
                    add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH+2 downto 3);
                  when b"100" =>
                    add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH+3 downto 4);  -- (39:4)
                    add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH+3 downto 4);
                  when others =>
                    add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH-1 downto 0);  -- same as (b"000")
                    add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH-1 downto 0);
                end case;
    
                -- Counter for controlling length of Filtering-state
                run_filter_cnt <= run_filter_cnt + 1;
    
                -- Read previous sum from accRAM
                if (run_filter_cnt > i_config_q_factor_reg-1) then
                  accRAM_addr_out <= filtered_accRAM_addr_out;
                elsif (run_filter_cnt > 0) then
                  -- Rotating accRAM-address counter for reading from accRAM
                  if (accRAM_addr_out = 0) then
                    accRAM_addr_out <= i_config_q_factor_reg(accRAM_addr_out'range) - 1;
                  else
                    accRAM_addr_out <= accRAM_addr_out - 1;
                  end if;
                end if;
    
                -- Add new sum and store in accRAM
                if (run_filter_cnt > C_START_WRITING_NEW_SUM+i_config_q_factor_reg) then
                  we_accRAM <= '0';
                elsif (run_filter_cnt > C_START_WRITING_NEW_SUM) then
                  we_accRAM    <= '1';
                  accRAM_Re_in <= std_logic_vector(signed(add_Re_in) + signed(accRAM_Re_out_add));
                  accRAM_Im_in <= std_logic_vector(signed(add_Im_in) + signed(accRAM_Im_out_add));
                  if (run_filter_cnt > C_START_WRITING_NEW_SUM+1) then
                    -- Rotating accRAM-address counter for writing to accRAM
                    if (accRAM_addr_in = 0 and we_accRAM = '1') then
                      accRAM_addr_in <= i_config_q_factor_reg(accRAM_addr_in'range) - 1;
                    else
                      accRAM_addr_in <= accRAM_addr_in - 1;
                    end if;
                  end if;
                end if;
    
                -- Update address-counter for coef-RAM, until last overlap is reached
                if (run_filter_cnt < i_config_q_factor_reg-1) then
                  if (coef_addr_cnt < i_config_d_factor_reg) then
                    coef_addr_cnt <= i_config_filterlength_reg(coef_addr_cnt'range) - 1;
                  else
                    coef_addr_cnt <= coef_addr_cnt - to_integer(i_config_d_factor_reg);
                  end if;
                end if;
    
                -- Control length of FILTERING-state
                if (run_filter_cnt = i_config_q_factor_reg+8) then
                  if (decimation_cnt < i_config_d_factor_reg) then
                    coef_addr_cnt <= i_config_filterlength_reg(coef_addr_cnt'range) -1 -decimation_cnt;  -- coef in first overlap
                    state_filter  <= NEXT_SAMPLE;
                  elsif (decimation_cnt = i_config_d_factor_reg) then
                    state_filter <= FILTER_OUTPUT;
                  end if;
                end if;
    
              -------------------------------------------------------------------
              when FILTER_OUTPUT =>
                o_filtered_sample.tdata  <= accRAM_Re_Out & accRAM_Im_out;
                o_filtered_sample.tvalid <= '1';
                o_filtered_sample.tlast  <= '0';
    
                if (i_config_q_factor_reg = 1) then
                  filtered_accRAM_addr_out <= (others => '0');
                elsif (filtered_accRAM_addr_out > i_config_q_factor_reg-2) then
                  filtered_accRAM_addr_out <= (others => '0');
                else
                  filtered_accRAM_addr_out <= filtered_accRAM_addr_out + 1;
                end if;
    
                -- Flush cell which is read, in accRAM
                we_accRAM      <= '1';
                accRAM_addr_in <= filtered_accRAM_addr_out;
                accRAM_Re_in   <= (others => '0');
                accRAM_Im_in   <= (others => '0');
    
                -- Update coefficient for next sample
                coef_addr_cnt <= i_config_filterlength_reg(coef_addr_cnt'range) - 1;
    
                -- Reset decimation counter
                decimation_cnt <= (others => '0');
    
                -- Store accRAM-address value for next round of filtering before
                -- changing state
                prev_accRAM_addr_in  <= accRAM_addr_in;
                prev_accRAM_addr_out <= accRAM_addr_out;
                state_filter         <= NEXT_SAMPLE;
    
              when others => state_filter <= NEXT_SAMPLE;
            end case;
    
          end if;
        end if;
      end process p_filter_fsm;

    Here are the filter statemachin which uses the config_q_factor:

     

  • sstrell's avatar
    sstrell
    Icon for Super Contributor rankSuper Contributor

    This code isn't showing any of the signal processing from your screenshot.  Where is dec_filter?

  • cjak's avatar
    cjak
    Icon for Occasional Contributor rankOccasional Contributor

    Yes, this is the filter. Both multiply and accumulate happens in the FILTERING-state, along with truncation. In addition, there is a RAM holding the accumulated sum. Input-samples are not stored when coming in.

      o_coef_addr <= std_logic_vector(coef_addr_cnt);
    
      -- Accumulator RAM
      p_accRAM : process (clk) is
      begin
        if rising_edge(clk) then
          if (we_accRAM = '1') then
            accRAM_Re(to_integer(accRAM_addr_in)) <= accRAM_Re_in;
            accRAM_Im(to_integer(accRAM_addr_in)) <= accRAM_Im_in;
          end if;
          accRAM_Re_out <= accRAM_Re(to_integer(accRAM_addr_out));
          accRAM_Im_out <= accRAM_Im(to_integer(accRAM_addr_out));
        end if;
      end process p_accRAM;
    
      -- Sample input-samples and detect tvalid-flanks
      p_edge : process (clk) is
      begin
        if rising_edge(clk)  then
          sample_s1.tvalid <= i_sample.tvalid;
          sample_s1.tdata  <= i_sample.tdata;
    
          if (i_sample.tvalid = '1' and sample_s1.tvalid = '0') then
            sample_tvalid_posedge <= '1';
            sample_tvalid_negedge <= '0';
          elsif (i_sample.tvalid = '0' and sample_s1.tvalid = '1') then
            sample_tvalid_negedge <= '1';
            sample_tvalid_posedge <= '0';
          else
            sample_tvalid_posedge <= '0';
            sample_tvalid_negedge <= '0';
          end if;
        end if;
      end process p_edge;

     

  • sstrell's avatar
    sstrell
    Icon for Super Contributor rankSuper Contributor

    A few things.

    You have way too much logic in your state machine.  The next state logic should simply determine the conditions for switching state (determined in separate processes or elsewhere) and specify what the next state should be.  Setting outputs in particular states should be in their own separate non-clocked process so the appropriate output appears as soon as you move to the new state.  If you're reliant on some logic within a state to then determine a new value before then assigning an output, that's going to cause timing issues.  Create additional states to give yourself extra clock cycles or pull that logic out.  Assignments like these that rely on other logic within the same filtering state are probably the issue and should be in separate combinatorial processes:

    accRAM_Re_in <= std_logic_vector(signed(add_Re_in) + signed(accRAM_Re_out_add));

    accRAM_Im_in <= std_logic_vector(signed(add_Im_in) + signed(accRAM_Im_out_add));

    Sticking to_integer and signed functions in there doesn't help matters either.

  • cjak's avatar
    cjak
    Icon for Occasional Contributor rankOccasional Contributor

    Thanks for the feedback. I will look into rewriting the statemachine next week.

    But, how do you go about to avoid type-casting/conversion like to_integer/signed?

    • sstrell's avatar
      sstrell
      Icon for Super Contributor rankSuper Contributor

      That may not necessarily be the issue.  Try cleaning up your state machine first.

  • cjak's avatar
    cjak
    Icon for Occasional Contributor rankOccasional Contributor

    Hi, I rewrote my state-machine, but I did not achieve much increase in performance.

    Adding multicycle-path statement on a "semi-static" configuration-registers caused the Fmax to go up, but I still have violations related to the filter. The calculations have been extracted from the state-machine as shown in the code below. 

    The timing analyzer still complains about long combinational paths. The process below calculates on every clock-cycle. Would it be a solution to added extra pipeline-registers in between the calculation stages, and add multicycle-statements to relax the timing requirement?

      p_filter_calculations : process (clk) is
      begin
        if rising_edge(clk) then
          -- Multiplication of input sample with coefficients
          mult_Re     <= std_logic_vector(resize(signed(sample_in) * signed(i_coef_data(2*GC_COEF_DATA_WIDTH-1 downto GC_COEF_DATA_WIDTH)), mult_Re'length));
          mult_Im     <= std_logic_vector(resize(signed(sample_in) * signed(i_coef_data(GC_COEF_DATA_WIDTH-1 downto 0)), mult_Im'length));
          mult_Re_ext <= std_logic_vector(resize(signed(mult_Re), mult_Re_ext'length));
          mult_Im_ext <= std_logic_vector(resize(signed(mult_Im), mult_Im_ext'length));
    
          -- Truncate number of bits defined in 'config_lsb_prod_reg'
          case i_config_lsb_prod_reg(2 downto 0) is
            when b"000" =>
              add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH-1 downto 0);  -- (35:0)
              add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH-1 downto 0);
            when b"001" =>
              add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH downto 1);    -- (36:1)
              add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH downto 1);
            when b"010" =>
              add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH+1 downto 2);  -- (37:2)
              add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH+1 downto 2);
            when b"011" =>
              add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH+2 downto 3);  -- (38:3)
              add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH+2 downto 3);
            when b"100" =>
              add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH+3 downto 4);  -- (39:4)
              add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH+3 downto 4);
            when others =>
              add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH-1 downto 0);  -- same as (b"000")
              add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH-1 downto 0);
          end case;
    
          -- Add new sum and store in accRAM
          if (run_filter_cnt > C_START_WRITING_NEW_SUM and
              run_filter_cnt < C_START_WRITING_NEW_SUM+i_config_q_factor_reg+1) then
            accRAM_Re_in <= std_logic_vector(signed(add_Re_in) + signed(accRAM_Re_out_add));
            accRAM_Im_in <= std_logic_vector(signed(add_Im_in) + signed(accRAM_Im_out_add));
          elsif (next_state = FILTER_OUTPUT) then
            accRAM_Re_in <= (others => '0');
            accRAM_Im_in <= (others => '0');
          end if;
    
        end if;
      end process p_filter_calculations;

     

    • ShengN_altera's avatar
      ShengN_altera
      Icon for Super Contributor rankSuper Contributor

      It's better to add some pipeline register in the state machine. Too high logic level combinational path is not recommended for high performance design because combinational path cannot be retimed.

      I think timing will improve a lot after adding pipeline register. 

  • cjak's avatar
    cjak
    Icon for Occasional Contributor rankOccasional Contributor

    Thanks, but I believe I have tried that too. I still get violations related to the calculations

    Is there something fundamental I have misunderstood about FSMs? or maybe Quartus TA?

    My FSM now looks like this:

      -----------------------------------------------------------------------------
      -- Filter statemachine
      --
      -- Avoid start filtering in the "middle" of tvalid, if reset is released at a
      -- bad time => need both negative edge and a positive edge on the next tvalid
      -- in order to start filtering.
      -----------------------------------------------------------------------------
      p_filter_fsm : process (clk) is
      begin
        if rising_edge(clk) then
          if (rst = '1') then
            current_state <= IDLE;
          else
            current_state <= next_state;
          end if;
        end if;
      end process p_filter_fsm;
    
      -----------------------------------------------------------------------------
      -- Counters for handling accRAM-accesses during filtering
      -----------------------------------------------------------------------------
      p_filter_fsm_counters : process (clk) is
      begin
        if rising_edge(clk) then
          if (rst = '1') then
            accRAM_addr_in           <= i_config_q_factor_reg(accRAM_addr_in'range) - 1;
            accRAM_addr_out          <= i_config_q_factor_reg(accRAM_addr_out'range) - 1;
            prev_accRAM_addr_in      <= i_config_q_factor_reg(accRAM_addr_in'range) - 1;
            prev_accRAM_addr_out     <= i_config_q_factor_reg(accRAM_addr_out'range) - 1;
            coef_addr_cnt            <= i_config_filterlength_reg(coef_addr_cnt'range) - 1;
            filtered_accRAM_addr_out <= (others => '0');
            decimation_cnt           <= (others => '0');
            run_filter_cnt           <= (others => '0');
            sample_in                <= (others => '0');
          else
    
            -- Defaults
            run_filter_cnt <= (others => '0');
    
            case current_state is
              when IDLE =>
                prev_accRAM_addr_in  <= i_config_q_factor_reg(accRAM_addr_in'range) - 1;
                prev_accRAM_addr_out <= i_config_q_factor_reg(accRAM_addr_out'range) - 1;
                accRAM_addr_in       <= i_config_q_factor_reg(accRAM_addr_in'range) - 1;
                accRAM_addr_out      <= i_config_q_factor_reg(accRAM_addr_out'range) - 1;
                coef_addr_cnt        <= i_config_filterlength_reg(coef_addr_cnt'range) - 1;
                decimation_cnt       <= (others => '0');
    
              -------------------------------------------------------------------
              when NEXT_SAMPLE =>
                accRAM_addr_in  <= prev_accRAM_addr_in;
                accRAM_addr_out <= prev_accRAM_addr_out;
    
                if (sample_tvalid_posedge = '1' or sample_tvalid_negedge = '1') then
                  sample_in      <= sample_s1.tdata(GC_CH_DATA_WIDTH-1 downto 0);
                  coef_addr_cnt  <= coef_addr_cnt - to_integer(i_config_d_factor_reg);
                  decimation_cnt <= decimation_cnt + 1;
                end if;
    
              -------------------------------------------------------------------         
              when FILTERING =>
                -- Counter for controlling length of Filtering-state
                run_filter_cnt <= run_filter_cnt + 1;
    
                -- Update address-counter for coef-RAM, until last overlap is reached
                if (run_filter_cnt < i_config_q_factor_reg-1) then
                  if (coef_addr_cnt < i_config_d_factor_reg) then
                    coef_addr_cnt <= i_config_filterlength_reg(coef_addr_cnt'range) - 1;
                  else
                    coef_addr_cnt <= coef_addr_cnt - to_integer(i_config_d_factor_reg);
                  end if;
                end if;
    
                -- Control length of FILTERING-state
                if (run_filter_cnt = i_config_q_factor_reg+8) then
                  if (decimation_cnt < i_config_d_factor_reg) then
                    coef_addr_cnt <= i_config_filterlength_reg(coef_addr_cnt'range) -1 -decimation_cnt;  -- coef in first overlap
                  end if;
                end if;
    
                -- Read previous sum from accRAM
                if (run_filter_cnt > i_config_q_factor_reg-1) then
                  accRAM_addr_out <= filtered_accRAM_addr_out;
                elsif (run_filter_cnt > 0) then
                  -- Rotating accRAM-address counter for reading from accRAM
                  if (accRAM_addr_out = 0) then
                    accRAM_addr_out <= i_config_q_factor_reg(accRAM_addr_out'range) - 1;
                  else
                    accRAM_addr_out <= accRAM_addr_out - 1;
                  end if;
                end if;
    
                if (run_filter_cnt > C_START_WRITING_NEW_SUM+1 and
                    run_filter_cnt < C_START_WRITING_NEW_SUM+i_config_q_factor_reg+1) then
                  if (accRAM_addr_in = 0 and we_accRAM = '1') then
                    accRAM_addr_in <= i_config_q_factor_reg(accRAM_addr_in'range) - 1;
                  else
                    accRAM_addr_in <= accRAM_addr_in - 1;
                  end if;
                end if;
    
              ------------------------------------------------------------------- 
              when FILTER_OUTPUT =>
                -- Flush cell which is read, in accRAM
                accRAM_addr_in <= filtered_accRAM_addr_out;
    
                if (i_config_q_factor_reg = 1) then
                  filtered_accRAM_addr_out <= (others => '0');
                elsif (filtered_accRAM_addr_out > i_config_q_factor_reg-2) then
                  filtered_accRAM_addr_out <= (others => '0');
                else
                  filtered_accRAM_addr_out <= filtered_accRAM_addr_out + 1;
                end if;
    
                -- Update coefficient for next sample
                coef_addr_cnt <= i_config_filterlength_reg(coef_addr_cnt'range) - 1;
    
                -- Reset decimation counter
                decimation_cnt <= (others => '0');
                
                -- Store accRAM-address value for next round of filtering before
                -- changing state
                prev_accRAM_addr_in  <= accRAM_addr_in;
                prev_accRAM_addr_out <= accRAM_addr_out;
    
              when others => null;
            end case;
          end if;
        end if;
      end process p_filter_fsm_counters;
    
      -----------------------------------------------------------------------------
      -- Asynchronous FSM outputs
      -----------------------------------------------------------------------------
      p_filter_fsm_outputs : process (all) is
      begin
    
        -- Defaults
        next_state <= current_state;
    
        case current_state is
          when IDLE =>
            next_state <= NEXT_SAMPLE;
    
          -------------------------------------------------------------------
          when NEXT_SAMPLE =>
            if (sample_tvalid_posedge = '1' or sample_tvalid_negedge = '1') then
              next_state <= FILTERING;
            else
              next_state <= NEXT_SAMPLE;
            end if;
    
          -------------------------------------------------------------------
          when FILTERING =>
            -- Control length of FILTERING-state
            if (run_filter_cnt = i_config_q_factor_reg+8) then
              if (decimation_cnt < i_config_d_factor_reg) then
                next_state <= NEXT_SAMPLE;
              elsif (decimation_cnt = i_config_d_factor_reg) then
                next_state <= FILTER_OUTPUT;
              end if;
            else
              next_state <= FILTERING;
            end if;
    
          -------------------------------------------------------------------
          when FILTER_OUTPUT =>
            next_state <= NEXT_SAMPLE;
    
          when others => next_state <= NEXT_SAMPLE;
        end case;
      end process p_filter_fsm_outputs;
    
      -----------------------------------------------------------------------------
      -- FSM pipeline stage
      -----------------------------------------------------------------------------
      p_fsm_pipelining : process (clk) is
      begin
        if rising_edge(clk) then
          if (rst = '1') then
            accRAM_Re_out_add                                      <= (others => '0');
            accRAM_Im_out_add                                      <= (others => '0');
            we_accRAM                                              <= '0';
            o_filtered_sample.tdata(4*GC_CH_DATA_WIDTH-1 downto 0) <= (others => '0');
            o_filtered_sample.tvalid                               <= '0';
            o_filtered_sample.tlast                                <= '0';
    
          else
    
            -- Defaults
            accRAM_Re_out_add                                      <= accRAM_Re_out;
            accRAM_Im_out_add                                      <= accRAM_Im_out;
            we_accRAM                                              <= '0';
            o_filtered_sample.tdata(4*GC_CH_DATA_WIDTH-1 downto 0) <= (others => '0');
            o_filtered_sample.tvalid                               <= '0';
            o_filtered_sample.tlast                                <= '0';
    
            case current_state is
              when IDLE =>
                accRAM_Re_out_add        <= (others => '0');
                accRAM_Im_out_add        <= (others => '0');
                o_filtered_sample.tvalid <= '0';
                o_filtered_sample.tlast  <= '0';
    
              -------------------------------------------------------------------
              when FILTERING =>
                -- Add new sum and store in accRAM
                if (run_filter_cnt > C_START_WRITING_NEW_SUM+i_config_q_factor_reg) then
                  we_accRAM <= '0';
                elsif (run_filter_cnt > C_START_WRITING_NEW_SUM) then
                  we_accRAM <= '1';
                end if;
    
              -------------------------------------------------------------------
              when FILTER_OUTPUT =>
                we_accRAM                <= '1';
                o_filtered_sample.tdata  <= accRAM_Re_Out & accRAM_Im_out;
                o_filtered_sample.tvalid <= '1';
                o_filtered_sample.tlast  <= '0';
    
              when others => null;
            end case;
    
          end if;
        end if;
      end process p_fsm_pipelining;
    
      -----------------------------------------------------------------------------
      -- Filter-calculations, synchronous
      -----------------------------------------------------------------------------
      p_filter_calculations : process (clk) is
      begin
        if rising_edge(clk) then
          -- Multiplication of input sample with coefficients
          mult_Re <= std_logic_vector(resize(signed(sample_in) * signed(i_coef_data(2*GC_COEF_DATA_WIDTH-1 downto GC_COEF_DATA_WIDTH)), mult_Re'length));
          mult_Im <= std_logic_vector(resize(signed(sample_in) * signed(i_coef_data(GC_COEF_DATA_WIDTH-1 downto 0)), mult_Im'length));
    
          -- Resizing of vectors
          mult_Re_ext <= std_logic_vector(resize(signed(mult_Re), mult_Re_ext'length));
          mult_Im_ext <= std_logic_vector(resize(signed(mult_Im), mult_Im_ext'length));
    
          -- Truncate number of bits defined in 'config_lsb_prod_reg'
          case i_config_lsb_prod_reg(2 downto 0) is
            when b"001" =>
              add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH downto 1);    -- (36:1)
              add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH downto 1);
            when b"010" =>
              add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH+1 downto 2);  -- (37:2)
              add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH+1 downto 2);
            when b"011" =>
              add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH+2 downto 3);  -- (38:3)
              add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH+2 downto 3);
            when b"100" =>
              add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH+3 downto 4);  -- (39:4)
              add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH+3 downto 4);
            when others =>
              add_Re_in <= mult_Re_ext(GC_ADDER_WIDTH-1 downto 0);  -- same as (b"000")
              add_Im_in <= mult_Im_ext(GC_ADDER_WIDTH-1 downto 0);
          end case;
    
          -- Add new sum and store in accRAM
          if (run_filter_cnt > C_START_WRITING_NEW_SUM and
              run_filter_cnt < C_START_WRITING_NEW_SUM+i_config_q_factor_reg+1) then
            accRAM_Re_in <= std_logic_vector(signed(add_Re_in) + signed(accRAM_Re_out_add));
            accRAM_Im_in <= std_logic_vector(signed(add_Im_in) + signed(accRAM_Im_out_add));
          elsif (next_state = FILTER_OUTPUT) then
            accRAM_Re_in <= (others => '0');
            accRAM_Im_in <= (others => '0');
          end if;
    
        end if;
      end process p_filter_calculations;

     

    • ShengN_altera's avatar
      ShengN_altera
      Icon for Super Contributor rankSuper Contributor

      Could you try retime stage report and fast forward timing closure report for the path that need register check below:

       

    • ShengN_altera's avatar
      ShengN_altera
      Icon for Super Contributor rankSuper Contributor

      Does your problem resolved? Do you need further help? If yes, possible provide your project file?

      • cjak's avatar
        cjak
        Icon for Occasional Contributor rankOccasional Contributor

        No, I have not achieved timing closure, and the redesigned filter seem to achieve a lower fmax than the original implementation. Unfortunately I cannot share the design.