Forum Discussion

Vic3Dexe's avatar
Vic3Dexe
Icon for Occasional Contributor rankOccasional Contributor
3 years ago

SDC constrains for async static RAM

I have async static RAM with 10 ns access time.

I try to read data from it in such a way, that address formed in one register, then data latched to another, both have the same clock 50 MHz (20 ns). nCE for chip is always = 0, i.e. always enabled and nRD also always = 0. Write is rare and it works good.

The problem is that the SRAM readed from time to time, so part of data is corrupted.

Actual problem is in delay in FPGA (Cyclone III) from data pins to data register. Timequest says it is about 6 ns (!!!), so (access time + address delay from register to port + data delay from port to register) > 20 ns, and data not always latched in time. Trace delay on board is negligible.

I assume I need to write some constrains, but I'm really confused how to do this. And I can't find any examples for this case. So the actual question: how to write these **bleep** constrains?

12 Replies

  • ak6dn's avatar
    ak6dn
    Icon for Regular Contributor rankRegular Contributor

    Here is my setup on a Terasic DE1 board, which uses a CycloneII FPGA and an attached 10ns async SRAM device.
    There is a 50MHz (20ns) clock input that is transformed thru a PLL to an 80MHz (12.5ns) clock for the logic.
    I can perform memory tests and read and write this SRAM continuously (for days on end...) with no errors occurring.

    FYI the SRAM is used as the main memory for an FPGA PDP-8 implementation

    
    # Input 50MHz reference clock
    
    create_clock -period 20.0 -name CLOCK_50 [get_ports {CLOCK_50}]
    
    # Created clocks based on PLLs (CPUCLK = 80MHz)
    
    create_generated_clock -source {pll|altpll_component|pll|inclk[0]} -divide_by 5 -multiply_by 8 -duty_cycle 50 -name CPUCLK {pll|altpll_component|pll|clk[0]}
    
    ### external async SRAM timing ###
    
    # address/control outputs
    
    set_output_delay -clock CPUCLK -clock_fall -max 4.0 [get_ports {SRAM_*_L SRAM_A[*]}]
    set_output_delay -clock CPUCLK -clock_fall -min 0.5 [get_ports {SRAM_*_L SRAM_A[*]}]
    
    # write data outputs
    
    set_output_delay -clock CPUCLK -max 3.0 [get_ports {SRAM_DQ[*]}]
    set_output_delay -clock CPUCLK -min 0.5 [get_ports {SRAM_DQ[*]}]
    set_multicycle_path -rise_from CPUCLK -to [get_ports {SRAM_DQ[*]}] -setup 2
    set_multicycle_path -rise_from CPUCLK -to [get_ports {SRAM_DQ[*]}] -hold 2
    
    # read data inputs
    
    set_input_delay -clock CPUCLK -max 10.0 [get_ports {SRAM_DQ[*]}]
    set_input_delay -clock CPUCLK -min  3.0 [get_ports {SRAM_DQ[*]}]
    set_multicycle_path -from [get_ports {SRAM_DQ[*]}] -rise_to CPUCLK -setup 2
    set_multicycle_path -from [get_ports {SRAM_DQ[*]}] -rise_to CPUCLK -hold 2
    
    

    And for reference here is the verilog implementation it references...

    module mm8e_memory
        #(
          // external parameters
    
          parameter		TPD = 0,		// simulation delay
          parameter		INTBANKS = 2,		// memory size, 4K banks (internal memory)
          parameter		EXTBANKS = 6		// memory size, 4K banks (external memory)
    
          )
        (
         // port definitions
    
         input wire		clk,			// system clock
         input wire 	reset,			// system reset
    
         input wire 	init,			// bus init
    
         input wire 	mr,			// memory read
         input wire 	mw,			// memory write
         input wire [0:2] 	ema,			// extended memory address
         input wire [0:11] 	ma,			// memory address
    
         inout wire [0:11] 	md,			// memory data in/out
    
         output reg [14:0] 	ext_addr,		// external memory address
         output reg 	ext_we_l,		// external memory write enable
         output reg 	ext_ce_l,		// external memory select
         output reg 	ext_oe_l,		// external memory read enable
    
         inout wire [11:0] 	ext_dq			// external memory data in/out
    
         );
    
        // internal parameters
    
        localparam
    	MEMSIZE = 4096*INTBANKS;		// internal memory size
    
        // local signals
    
        wire [14:0] 	addr = {ema[0:2],ma[0:11]}; // full memory address
    
        reg [0:11] 		memory [0:MEMSIZE-1] /* synthesis ramstyle = "no_rw_check" */;
        reg [0:11] 		mdo;
        wire [0:11] 	mdi = md;
        reg 		mrd;
    
        wire 		enb_int = (INTBANKS > 0) && (ema <= INTBANKS-1);
        wire 		enb_ext = (EXTBANKS > 0) && (ema >= INTBANKS) && (ema <= INTBANKS+EXTBANKS-1);
    
        // internal memory
    
        initial
    	$readmemb("meminit.txt", memory, 0, MEMSIZE-1);
    
        always @(posedge clk) mrd <= #TPD mr & enb_int;
    
        wire 		mwr = mw & enb_int;
    
        always @(posedge clk)
    	begin
            if (mwr) memory[addr] <= #TPD mdi;
            mdo <= #TPD memory[addr];
    	end
    
        assign 		md = mr & mrd ? mdo : {12{1'bz}};
    
        // external memory
    
        always @(negedge clk)
    	begin
      	ext_addr <= #TPD addr;
    	ext_ce_l <= #TPD 1'b0;
    	ext_we_l <= #TPD ~( mw & ~mr) | ~ext_we_l;
    	ext_oe_l <= #TPD ~(~mw &  mr);
    	end
        
        assign 		ext_dq = mw & ~mr ? mdi : {12{1'bz}};
        assign 		md = mr & ~mw & enb_ext ? ext_dq : {12{1'bz}};
    
    endmodule // mm8e_memory
  • Vic3Dexe's avatar
    Vic3Dexe
    Icon for Occasional Contributor rankOccasional Contributor

    In your example you have md assigned twice, is it ok?

        assign 		md = mr & mrd ? mdo : {12{1'bz}};
    ...
        assign 		md = mr & ~mw & enb_ext ? ext_dq : {12{1'bz}};

    And mdo is actually read from internal memory, not external

       always @(posedge clk)
    	begin
            if (mwr) memory[addr] <= #TPD mdi;
            mdo <= #TPD memory[addr];
    	end

    while data from external (as far as I understand) is not registered, md lines are just output of the module

       assign 		ext_dq = mw & ~mr ? mdi : {12{1'bz}};
        assign 		md = mr & ~mw & enb_ext ? ext_dq : {12{1'bz}};

    So it's not my case, my problem starts when I try to register md lines.

    And in these lines

    set_input_delay -clock CPUCLK -max 10.0 [get_ports {SRAM_DQ[*]}]
    set_input_delay -clock CPUCLK -min  3.0 [get_ports {SRAM_DQ[*]}]
    set_multicycle_path -from [get_ports {SRAM_DQ[*]}] -rise_to CPUCLK -setup 2
    set_multicycle_path -from [get_ports {SRAM_DQ[*]}] -rise_to CPUCLK -hold 2

    10 and 3 are what? Ok, lets assume 10 is 10 ns access time. So what is 3? Hold time? Why 3? Shouldn't it be negative?

    Multicycle is 2... why? In your code you set address at negedge and latch data at posedge (lets assume you read external RAM). Isn't it all counts as only 1 cycle?

    Same in these lines

    set_output_delay -clock CPUCLK -clock_fall -max 4.0 [get_ports {SRAM_*_L SRAM_A[*]}]
    set_output_delay -clock CPUCLK -clock_fall -min 0.5 [get_ports {SRAM_*_L SRAM_A[*]}]

    where 4 and 0.5 comes from?

    I appreciate for helping, but I want not only copy-paste, I want to understand what I copypasting )

  • ak6dn's avatar
    ak6dn
    Icon for Regular Contributor rankRegular Contributor

    I was not intending to provide THE solution to your problem, only HOW I implemented my solution, to show how to apply SDC constraints. I did not intend it to be a cut and paste solution for you.

    In my particular case, there are two memories on the same bus, an internal memory implemented via block rams, and an external memory implemented in the 256KB async SRAM device attached to the FPGA. There was not enough internal block ram available to build the entire memory (32K x 12 bit) using internal block ram, so I split it and have the low 8K internal, the upper 24K external.

    Yes, the md lines are assigned twice, as a tri-state bus with mutually exclusive enable signals.

    Yes, mdo is a register that only clocks the output data of the internal block ram.

    md lines are registered at the next higher level to this module (at the posedge of clk).

    The timing setup/hold numbers were based on the data sheet specs of the SRAM device on the board.

    Multicycle is 2 since it is not realistic to drive a 10ns access SRAM device using a 12.5ns clock period.

    If you believe you can meet timing using a 10ns device on a 20ns clock period, then you only need set_input_delay and set_output_delay. No multicycle_path statement needed.

    • Vic3Dexe's avatar
      Vic3Dexe
      Icon for Occasional Contributor rankOccasional Contributor

      @ak6dn wrote:

      I was not intending to provide THE solution to your problem, only HOW I implemented my solution, to show how to apply SDC constraints. I did not intend it to be a cut and paste solution for you.


      Oh, man, I'm sorry. This is my bad english. I don't mean you should provide a copy-paste solution for me.

      I mean I don't understand how your solution works, so I can't use it to produce my solution )


      @ak6dn wrote:

      md lines are registered at the next higher level to this module (at the posedge of clk).

      Multicycle is 2 since it is not realistic to drive a 10ns access SRAM device using a 12.5ns clock period.

      If you believe you can meet timing using a 10ns device on a 20ns clock period, then you only need set_input_delay and set_output_delay. No multicycle_path statement needed.


      Sounds more reasonable to me. So you keep registering data at posedge, but wait 1 extra period of clk before actually using data?

      Well, I can lower the frequency too, I just don't want to do this.


      @ak6dn wrote:

      The timing setup/hold numbers were based on the data sheet specs of the SRAM device on the board.

      How they are based? I tried to use some formulas from google, but got a nonsense.

  • Nurina's avatar
    Nurina
    Icon for Regular Contributor rankRegular Contributor

    Hi,


    Could you try put your data_reg at the I/O? This would reduce the delay.


    Regards,

    Nurina


    • Vic3Dexe's avatar
      Vic3Dexe
      Icon for Occasional Contributor rankOccasional Contributor
      @Nurina wrote:

      Could you try put your data_reg at the I/O? This would reduce the delay.


      I will try, thx.

  • Vic3Dexe's avatar
    Vic3Dexe
    Icon for Occasional Contributor rankOccasional Contributor

    And I forgot to mention.

    I've added yesterday

    set_max_delay -from [get_ports {data_pin[*]}] -to [get_registers {data_reg[*]}] 3.0

    W/o this all data were corrupted. With this line only few.

    Then I've assigned fast_input_register to data_pin (not data_reg, fitter keep ignoring this). Nothing changed. How to check is it actually fast now?

    Then I've change 3.0 to 2.0, and it working fine.

    So as I see, I've manually fixed delay between pin and reg. Chip planner confirms this, the register is now much closer to the pin.

    But it is definitely bad approach. I think quartus should do this automatically with correct constrains.

    I've tried to put

    set_input_delay -clock [get_clocks {50M}] -max 8.5 [get_ports data_pin*]
    set_input_delay -clock [get_clocks {50M}] -min 0 [get_ports data_pin*]

    and get lot of slacks (setup 50M), mostly all data are corrupted. For example one of them

    Obviously, that constrains are wrong, because they do reverse things.

    So, again, the question: how to tell quartus automatically reduce delay on data path? What constrains should be there?

    UPD: and I'm stuck again.

    Data were slightly corrupted after some changes, and I've changed 2.0 to 1.0. All works fine (again), but in chip planner data_reg is veeery far away from data_pin. And data path from TimeQuest reports 3.9 ns (it was about 2.2 before)...

    I don't understand why it is working, what am I doing, and what to do next.

  • Nurina's avatar
    Nurina
    Icon for Regular Contributor rankRegular Contributor

    Hi,


    May I know if your problem has been resolved?


    Regards,

    Nurina


  • Nurina's avatar
    Nurina
    Icon for Regular Contributor rankRegular Contributor

    Hello,


    We did not receive any response to the previous reply provided, thus I will put this case to close pending. Please post a response in the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you with your follow-up questions.


    Regards,

    Nurina


    P/S: If you like my comment, feel free to give Kudos. If my comment solved your problem, feel free to accept my comment as solution.


  • Vic3Dexe's avatar
    Vic3Dexe
    Icon for Occasional Contributor rankOccasional Contributor

    2Nurina

    I'm really appreciate for help.

    1. Logic lock is overkill. It should work without such a deep intervetion. It's a simple RAM, come on!

    2. I've found some logic between data/addr register and FPGA ports. Totally forgot about that. So I've removed this logic (strictly speaking, I've moved it behind registers), and now reading is perfect. I've even removed set_max_delay lines in SDC. This is good, because I don't understand what they did anyway )

    3. But then a new problem arrived: in some cases write is failing, and I don't understand why. Maybe again it is logic on the nWR line, idk. I can't remove this logic right now, will try some workaround.

    4. About this topic... I've asked about constrains for static RAM, delays is a secondary problem, it should be resolved by correct SDC. I'm now reading this, it is much more close to what I need. Hope it will help. If you want to close this topic - it's ok.

    ps One more question: are there any major differences between Cyclone III and IV?

    I mean, i/o buffers or smth. Because I did something similar couple years ago on Cyclone IV, and have no problems at all without any constrains, fast registers etc.

  • Vic3Dexe's avatar
    Vic3Dexe
    Icon for Occasional Contributor rankOccasional Contributor

    Well, it seems to me I've forced it to work.

    I've assigned ADDR pins to fast output, and DATA pins to fast input/output.

    Then, I've created a new clock with PLL, it's the same 50 MHz, but with 90 degree shift. With this clock I've moved the nWR pulse a little forward.

    Idk what have actually helped - the fast registers thing or shifted clock, but for now I can't reproduce any errors.

    One thing I can't figure out: why signaltap don't show me the phase shift?

    Here is the screenshot, capture on 200 MHz from PLL, trigger on falling edge of nWR.

    add: seems it is shifted clock. I've tried to switch nWR back to the 0 degree one, and got errors.