weired results assigning memory blocks

Question

Hey Guys,

I have this weird situation (to me at least) and wanted to know if anyone here has any insights on it.

I was trying to synthesize a design on an EP4SGX530 Stratix IV FPGA. I needed 2364 dual-port memory blocks all being accessed in parallel, each of these memories are very small with both address and data widths of 4. I looked at the specs for EP4SGX530 and noticed that it only has 1280 M9Ks, so I figured since I can only have 1280 memories to read and write simultaneously, it might not be possible.

But, I thought let’s try it and created memory modules using “altsyncram megafunction” and synthesized the design.

To my surprise, it successfully finished synthesis, place and route!

I check the fitter report, and it reported that only 264 of the available 1280 M9K blocks are used! This means that somehow, it sliced an M9K block into 8 independent dual-port memories with independent clocks, read and write ports, and fitted 2364 blocks into these 264 M9Ks, but I thought this is not possible!

I even tested the design, and it is working fine, so I am missing something here?

Sina

altera_forum · Answer

You cannot split the M9K block as you suggest into multiple independent true dual port memories.  You can split an M9K into two independent simple dual port memories, but this does not explain what you observed.  I suggest you check and make sure that you didn't get any messages during synthesis that memory blocks were synthesized away for some reason.  This doesn't explain why your design apparently functions as you expect.  Perhaps QSyn was able to merge multiple memories that you thought were independent into a lower number of memory blocks than you thought.

altera_forum · Answer

Tnx Jimbo ;)

Please look at the attached fitter report.

In section 10 (HardCopy Device Resource Guide), you can see that 292/1280 M9K rams are used.

Now if you look at section 26 (Fitter RAM Summary), if you count the numbers in M9K column, you will get a total of 2399! (I added them using excel)

Also, if you look at locations here, you can see that many M9K location are used by mutiple instances, for example, location M9K_X122_Y86_N0 is used in MEM[0], MEM[1500], MEM[1540], MEM[1775], MEM[1800], MEM[1840], MEM[295], MEM[80], MEM[950] and MEM[965].

This does not make sense to me at all.

You said it might merged memories, how can I check that?

Thanks,

Sina

altera_forum · Answer

Assuming that Quartus doesn't merge RAM blocks erroneously, the RTL must allow it. Without knowing the RTL, it's impossible to determine, how the RAM blocks can be merged. You can simply check one example of blocks going to the same location, which signals are connected in the RTL netlist.

altera_forum · Answer

Are you inferring the memory, or using the MegaWizard? If the former, then check the section in the Quartus II Handbook on synthesis regarding inferring dual-port memory. Your RTL code must be coded in the specific manner detailed in the handbook for this to work. If the latter, then the MegaWizard will tell you how many M9K blocks are required.

Try taking as small a portion of your RTL code as possible and compile just this to get a better idea how it is being synthesized.

altera_forum · Answer

Thanks guys, this is how I instantiate RAMs:

generate

for (i=0; i < memsize; i=i+1) begin : MEM

memory# ( .soft_bits(soft_bits), .z(normalized_address_width), .address_width(address_width)) U ( .clock(sys_clk), .data(data_in), .rdaddress(address_out), .rden(read), .wraddress(address_in), .wren(write), .q(data_out));

end

endgenerate

and then, I use MegaWizard to generate rams, I made some changes to make it parametric, but did not change anything else.

Note that the RAM modules I geenrate are very small, but they are dual port and I didn't see anything about merging small memory blocks in datasheets.

`timescale 1 ps / 1 ps

module memory (

clock,

data,

rdaddress,

rden,

wraddress,

wren,

q);

parameter soft_bits=5;

parameter z=4;

parameter address_width=2;

input clock;

input [soft_bits-1:0] data;

input [address_width-1:0] rdaddress;

input rden;

input [address_width-1:0] wraddress;

input wren;

output [soft_bits-1:0] q;

`ifndef ALTERA_RESERVED_QIS

// synopsys translate_off

`endif

tri1 clock;

tri1 rden;

tri0 wren;

`ifndef ALTERA_RESERVED_QIS

// synopsys translate_on

`endif

wire [soft_bits-1:0] sub_wire0;

wire [soft_bits-1:0] q = sub_wire0[soft_bits-1:0];

altsyncram altsyncram_component (

.address_a (wraddress),

.clock0 (clock),

.data_a (data),

.rden_b (rden),

.wren_a (wren),

.address_b (rdaddress),

.q_b (sub_wire0),

.aclr0 (1'b0),

.aclr1 (1'b0),

.addressstall_a (1'b0),

.addressstall_b (1'b0),

.byteena_a (1'b1),

.byteena_b (1'b1),

.clock1 (1'b1),

.clocken0 (1'b1),

.clocken1 (1'b1),

.clocken2 (1'b1),

.clocken3 (1'b1),

.data_b ({soft_bits{1'b1}}),

.eccstatus (),

.q_a (),

.rden_a (1'b1),

.wren_b (1'b0));

defparam

altsyncram_component.address_aclr_b = "NONE",

altsyncram_component.address_reg_b = "CLOCK0",

altsyncram_component.clock_enable_input_a = "BYPASS",

altsyncram_component.clock_enable_input_b = "BYPASS",

altsyncram_component.clock_enable_output_b = "BYPASS",

altsyncram_component.intended_device_family = "Stratix IV",

altsyncram_component.lpm_type = "altsyncram",

altsyncram_component.numwords_a = z,

altsyncram_component.numwords_b = z,

altsyncram_component.operation_mode = "DUAL_PORT",

altsyncram_component.outdata_aclr_b = "NONE",

altsyncram_component.outdata_reg_b = "CLOCK0",

altsyncram_component.power_up_uninitialized = "FALSE",

altsyncram_component.rdcontrol_reg_b = "CLOCK0",

altsyncram_component.read_during_write_mode_mixed_ports = "OLD_DATA",

altsyncram_component.widthad_a = address_width,

altsyncram_component.widthad_b = address_width,

altsyncram_component.width_a = soft_bits,

altsyncram_component.width_b = soft_bits,

altsyncram_component.width_byteena_a = 1;

endmodule

Forum Discussion

weired results assigning memory blocks

8 Replies

Recent Discussions

Duplicate_hierarchy_depth / duplicate_register

Timing analysis - long combinational path

Automatically added negative node for TDS output doesn't work with Agilex 5

Quartus 20.1std compilation fails for Quartus map - Device 10AS057K2F40I1SG

QuartusPro 25.3 Crashed after using the Signal Tap Logic Analyzer