Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
15 years ago

weired results assigning memory blocks

Hey Guys,

I have this weird situation (to me at least) and wanted to know if anyone here has any insights on it.

I was trying to synthesize a design on an EP4SGX530 Stratix IV FPGA. I needed 2364 dual-port memory blocks all being accessed in parallel, each of these memories are very small with both address and data widths of 4. I looked at the specs for EP4SGX530 and noticed that it only has 1280 M9Ks, so I figured since I can only have 1280 memories to read and write simultaneously, it might not be possible.

But, I thought let’s try it and created memory modules using “altsyncram megafunction” and synthesized the design.

To my surprise, it successfully finished synthesis, place and route!

I check the fitter report, and it reported that only 264 of the available 1280 M9K blocks are used! This means that somehow, it sliced an M9K block into 8 independent dual-port memories with independent clocks, read and write ports, and fitted 2364 blocks into these 264 M9Ks, but I thought this is not possible!

I even tested the design, and it is working fine, so I am missing something here?

Sina

8 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    You cannot split the M9K block as you suggest into multiple independent true dual port memories. You can split an M9K into two independent simple dual port memories, but this does not explain what you observed. I suggest you check and make sure that you didn't get any messages during synthesis that memory blocks were synthesized away for some reason. This doesn't explain why your design apparently functions as you expect. Perhaps QSyn was able to merge multiple memories that you thought were independent into a lower number of memory blocks than you thought.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Tnx Jimbo ;)

    Please look at the attached fitter report.

    In section 10 (HardCopy Device Resource Guide), you can see that 292/1280 M9K rams are used.

    Now if you look at section 26 (Fitter RAM Summary), if you count the numbers in M9K column, you will get a total of 2399! (I added them using excel)

    Also, if you look at locations here, you can see that many M9K location are used by mutiple instances, for example, location M9K_X122_Y86_N0 is used in MEM[0], MEM[1500], MEM[1540], MEM[1775], MEM[1800], MEM[1840], MEM[295], MEM[80], MEM[950] and MEM[965].

    This does not make sense to me at all.

    You said it might merged memories, how can I check that?

    Thanks,

    Sina
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Assuming that Quartus doesn't merge RAM blocks erroneously, the RTL must allow it. Without knowing the RTL, it's impossible to determine, how the RAM blocks can be merged. You can simply check one example of blocks going to the same location, which signals are connected in the RTL netlist.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Are you inferring the memory, or using the MegaWizard? If the former, then check the section in the Quartus II Handbook on synthesis regarding inferring dual-port memory. Your RTL code must be coded in the specific manner detailed in the handbook for this to work. If the latter, then the MegaWizard will tell you how many M9K blocks are required.

    Try taking as small a portion of your RTL code as possible and compile just this to get a better idea how it is being synthesized.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thanks guys, this is how I instantiate RAMs:

    generate

    for (i=0; i < memsize; i=i+1) begin : MEM

    memory# ( .soft_bits(soft_bits), .z(normalized_address_width), .address_width(address_width)) U ( .clock(sys_clk), .data(data_in), .rdaddress(address_out), .rden(read), .wraddress(address_in), .wren(write), .q(data_out));

    end

    endgenerate

    and then, I use MegaWizard to generate rams, I made some changes to make it parametric, but did not change anything else.

    Note that the RAM modules I geenrate are very small, but they are dual port and I didn't see anything about merging small memory blocks in datasheets.

    `timescale 1 ps / 1 ps

    module memory (

    clock,

    data,

    rdaddress,

    rden,

    wraddress,

    wren,

    q);

    parameter soft_bits=5;

    parameter z=4;

    parameter address_width=2;

    input clock;

    input [soft_bits-1:0] data;

    input [address_width-1:0] rdaddress;

    input rden;

    input [address_width-1:0] wraddress;

    input wren;

    output [soft_bits-1:0] q;

    `ifndef ALTERA_RESERVED_QIS

    // synopsys translate_off

    `endif

    tri1 clock;

    tri1 rden;

    tri0 wren;

    `ifndef ALTERA_RESERVED_QIS

    // synopsys translate_on

    `endif

    wire [soft_bits-1:0] sub_wire0;

    wire [soft_bits-1:0] q = sub_wire0[soft_bits-1:0];

    altsyncram altsyncram_component (

    .address_a (wraddress),

    .clock0 (clock),

    .data_a (data),

    .rden_b (rden),

    .wren_a (wren),

    .address_b (rdaddress),

    .q_b (sub_wire0),

    .aclr0 (1'b0),

    .aclr1 (1'b0),

    .addressstall_a (1'b0),

    .addressstall_b (1'b0),

    .byteena_a (1'b1),

    .byteena_b (1'b1),

    .clock1 (1'b1),

    .clocken0 (1'b1),

    .clocken1 (1'b1),

    .clocken2 (1'b1),

    .clocken3 (1'b1),

    .data_b ({soft_bits{1'b1}}),

    .eccstatus (),

    .q_a (),

    .rden_a (1'b1),

    .wren_b (1'b0));

    defparam

    altsyncram_component.address_aclr_b = "NONE",

    altsyncram_component.address_reg_b = "CLOCK0",

    altsyncram_component.clock_enable_input_a = "BYPASS",

    altsyncram_component.clock_enable_input_b = "BYPASS",

    altsyncram_component.clock_enable_output_b = "BYPASS",

    altsyncram_component.intended_device_family = "Stratix IV",

    altsyncram_component.lpm_type = "altsyncram",

    altsyncram_component.numwords_a = z,

    altsyncram_component.numwords_b = z,

    altsyncram_component.operation_mode = "DUAL_PORT",

    altsyncram_component.outdata_aclr_b = "NONE",

    altsyncram_component.outdata_reg_b = "CLOCK0",

    altsyncram_component.power_up_uninitialized = "FALSE",

    altsyncram_component.rdcontrol_reg_b = "CLOCK0",

    altsyncram_component.read_during_write_mode_mixed_ports = "OLD_DATA",

    altsyncram_component.widthad_a = address_width,

    altsyncram_component.widthad_b = address_width,

    altsyncram_component.width_a = soft_bits,

    altsyncram_component.width_b = soft_bits,

    altsyncram_component.width_byteena_a = 1;

    endmodule
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Are in and out address and data of an array type? How they are connected in the upper module? Are the control signals of generated RAM blocks identical or different? Without this information, you can't know, if RAM blocks can be merged. As sad, you should check the RTL netlist of the compiled design, it shows the real connection of the inferred RAM blocks.

    P.S.: Please consider, that compilation results of a test design, that doesn't completely connect all RAM instances at the outside, would be meaningless.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Here is a very basic test result, you can again see that the fitter merged 6 memory blocks into 1 M9K (M9K_X3_Y4_N0), I understand that since in this sample,inputs and outputs are not connected, it might not be accurate, but to my surprise, even in the full design with all connections, it is doing the same merges.

    As you said, I checked RTL netlist in my design and everything makes perfect sence and all simple dual port ram blocks are considered as single ram blocks with separate input and outputs, but I am observing the same merging effect.

    The question is, I didn't see anything about merging anywhere in Altera's Literature!, how is this even possible?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    It has been said.

    --- Quote Start ---

    Please consider, that compilation results of a test design, that doesn't completely connect all RAM instances at the outside, would be meaningless.

    --- Quote End ---

    Generally, Quartus integrated synthesis will optimize any part of the design, it's able do. So it does in th epresent case.