Forum Discussion

k_dz's avatar
k_dz
Icon for New Contributor rankNew Contributor
1 year ago
Solved

Inferring RAM with byte-enable from parametrized SystemVerilog

I'm trying to write RAM with byte-enable inputs with parametrizable number of bytes. Templates for RAM with byte-enable in Lite and Standard versions of Quartus assume a certain number of bytes in a word.

I tried the following code, straight from the User Guide, but it is not inferred as RAM and uses registers instead of M10K memory blocks (target is Cyclone V):

module byte_enabled_simple_dual_port_ram
(
    input we, clk,
    input [ADDRESS_WIDTH-1:0] waddr, raddr, // address width = 6
    input [NUM_BYTES-1:0] be, // 4 bytes per word
    input [(BYTE_WIDTH * NUM_BYTES -1):0] wdata, // byte width = 8, 4 bytes per word
    output reg [(BYTE_WIDTH * NUM_BYTES -1):0] q // byte width = 8, 4 bytes per word
);

   parameter ADDRESS_WIDTH = 6;
   parameter DEPTH = 2**ADDRESS_WIDTH;
   parameter BYTE_WIDTH = 8;
   parameter NUM_BYTES = 4;

   // use a multi-dimensional packed array
   // to model individual bytes within the word
   logic [NUM_BYTES-1:0][BYTE_WIDTH-1:0] ram[0:DEPTH-1];
   // # words = 1 << address width

   // port A
   always@(posedge clk)
   begin
      if(we) begin
          for (int i = 0; i < NUM_BYTES; i = i + 1) begin
            if(be[i]) ram[waddr][i] <= wdata[i*BYTE_WIDTH +: BYTE_WIDTH];
          end
      end
      q <= ram[raddr];
   end
endmodule

After manually unrolling the loop to the following form (assuming NUM_BYTES ≤ 8), Quartus can properly infer RAM and map it to memory resources:

   always@(posedge clk)
   begin
      if(we) begin
            if(be[0]) ram[waddr][0] <= wdata[0*BYTE_WIDTH +: BYTE_WIDTH];
            if(NUM_BYTES>1) if(be[1]) ram[waddr][1] <= wdata[1*BYTE_WIDTH +: BYTE_WIDTH];
            if(NUM_BYTES>2) if(be[2]) ram[waddr][2] <= wdata[2*BYTE_WIDTH +: BYTE_WIDTH];
            if(NUM_BYTES>3) if(be[3]) ram[waddr][3] <= wdata[3*BYTE_WIDTH +: BYTE_WIDTH];
            if(NUM_BYTES>4) if(be[4]) ram[waddr][4] <= wdata[4*BYTE_WIDTH +: BYTE_WIDTH];
            if(NUM_BYTES>5) if(be[5]) ram[waddr][5] <= wdata[5*BYTE_WIDTH +: BYTE_WIDTH];
            if(NUM_BYTES>6) if(be[6]) ram[waddr][6] <= wdata[6*BYTE_WIDTH +: BYTE_WIDTH];
            if(NUM_BYTES>7) if(be[7]) ram[waddr][7] <= wdata[7*BYTE_WIDTH +: BYTE_WIDTH];
      end
      q <= ram[raddr];
   end

I tried adding the (* ramstyle = "M10K" *) synthesis attribute when using for, but it didn't help.

Is there a way to do it without repeating almost the same line enough times, or, despite the code example from Quartus Pro User Guide, the Lite and Standard versions do not properly handle loops?

Tested with Quartus Prime Lite 23.1.1 and Quartus Prime Standard 22.1.

10 Replies

  • sstrell's avatar
    sstrell
    Icon for Super Contributor rankSuper Contributor

    Well that's embarrassing that the documented template doesn't work!

    Where/how are you defining the parameter values? Does the compiler give any warning or explanation why registers are being used instead of RAM blocks?

  • k_dz's avatar
    k_dz
    Icon for New Contributor rankNew Contributor

    To make sure that nothing else affects the synthesis, I'm just setting the file containing only byte_enabled_simple_dual_port_ram module as top level entity and let it use the default parameter values.

    Even with HDL_MESSAGE_LEVEL LEVEL3, SYNTH_MESSAGE_LEVEL HIGH and the ramstyle = "M10K" attribute there are no logs regarding ram variable, except for

    Info (10008): Verilog HDL or VHDL information: EDA Netlist Writer cannot regroup multidimensional array "ram" into its bus

    I would expect Quartus to at least point why the synthesis attribute is ignored.

    • k_dz's avatar
      k_dz
      Icon for New Contributor rankNew Contributor

      Here's the project archive containing both the version with for (set as top level, not inferred as RAM) and the one that has been manually unrolled (inferred as RAM). The _w_output archive additionally contains report files.

  • KennyT_altera's avatar
    KennyT_altera
    Icon for Super Contributor rankSuper Contributor

    Thanks for your design, after investigation, this user guide only can be used of Quartus Pro related devices.(Statix 10, Agilex 7 and etc).

    It will not be applicable for Std edition, https://www.intel.com/content/www/us/en/docs/programmable/683082/24-3/ram-with-byte-enable-signals.html#mwh1409959589276__example_SystemVerilog_Simple

    In your Quartus, if your right click and insert template, you will see this example in standard edition.

  • KennyT_altera's avatar
    KennyT_altera
    Icon for Super Contributor rankSuper Contributor

    I tested with the inserted template from Quartus Prime Std, it does not able to infer the M10K as well. What you will need to do is to use the IP directly from the Quartus Prime Std in order to infer it.

  • RichardT_altera's avatar
    RichardT_altera
    Icon for Super Contributor rankSuper Contributor

    May I know if you need further help regarding this case?


    Regards,

    Richard Tan


  • KennyT_altera's avatar
    KennyT_altera
    Icon for Super Contributor rankSuper Contributor

    As we do not receive any response from you on the previous answer that we have provided. Please login to ‘https://supporttickets.intel.com/s/?language=en_US’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.



  • FvM's avatar
    FvM
    Icon for Super Contributor rankSuper Contributor

    Hi,

    found a better solution for Quartus Std., use generate for instead of simple for loop.

    genvar i;
    generate for (i = 0; i < NUM_BYTES; i++) begin : gen_be
       always@(posedge clk)
          if(we & be[i]) ram[waddr][i] <= wdata[i*BYTE_WIDTH +: BYTE_WIDTH];
    end
    endgenerate
    
       always@(posedge clk)
       begin
          q <= ram[raddr];
       end

    Regards
    Frank

    • k_dz's avatar
      k_dz
      Icon for New Contributor rankNew Contributor

      Although it uses dedicated memory resources and does not require repeating a single line, I'm not sure it's a better solution. In my testing it maps to NUM_BYTES separate M10K blocks and I couldn't find a way to convert it to write-first/"read new data", as Quartus requires blocking assignments to infer such same-port read-during-write behaviour.

      I know that I can just generate N separately driven RAM modules, each storing the i-th byte forming N-byte word, but that is just wasteful in many cases.