Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
15 years ago

What I did Wrong in my Verilog Project? (Quartus cannot compile it)

Hi,

I am trying to implement data conversion algorithm. It has 8 inputs and its clock running at 400MHz and 1 bit output and its clock running at about 1MHz.

During the project I need to find when my input starts, collect 192*1024 bits of continuous input into "RingData" with the time stamp "TimeStep". I used only ones RindData read and writes; the same is true for timestep.

when i start it at quartus (at xp professional 32bit) i am waiting about 5 minutes and still at the stage of 9% of synthes. the last messages that i am recieving from quartus are:

Info: Found 2 instances of uninferred RAM logic

Info: RAM logic "RingData" is uninferred due to asynchronous read logic

Info: RAM logic "TimeStep" is uninferred due to inappropriate RAM size

Warning: Cannot convert all sets of registers into RAM megafunctions when creating nodes; therefore the resulting number of registers remaining in design can cause longer compilation time or result in insufficient memory to complete Analysis and Synthesis

[/I]

It seems that something I did wrong but I cannot figured it out. Please, adwise me what to do, my complete Verilog file is attached.

Sincerely,

Ilghiz

PS: I simplified the source removing unnecessary mathematics.


module My_First_Project (InData, InReady, OutClock, OutData);
parameter MaxK=8;
parameter MaxN=MaxK*1024;
input   InData;
input  InReady, OutClock;
output OutData;
reg    OutData;
reg signed  LocData1, aLD0, aLD1, aLD2, aLD3, aLD4;
reg signed  LD0, LD1, LD2, LD3, LD4;
reg  DeMuxCounter;
reg  Temp;
reg  H00, H01, H02, H03, H04, H05, H06, H07, H08, H09;
reg  H10, H11, H12, H13, H14, H15, H16, H17, H18, H19;
reg  H20, H21, H22, H23, H24, H25, H26, H27, H28, H29;
reg  H30, H31, H32, H33, H34, H35, H36, H37, H38, H39;
reg       G01, G02, G03, G04, G05, G06, G07, G08, G09;
reg  G10, G11, G12, G13, G14, G15, G16, G17, G18, G19;
reg  G20, G21, G22, G23, G24, G25, G26, G27, G28, G29;
reg  G30, G31, G32, G33, G34, G35, G36, G37, G38, G39;
reg  NadoT;
reg  CurTime;
reg        NextStep;
reg  RingData ;
reg  TimeStep ;
reg  BeginPos;
reg  EndPos;
reg  CurShiftData;
reg   CurPos;
reg EndPosSw;
initial
begin
  CurTime=0;
  NadoT=0;
  DeMuxCounter=0;
  BeginPos=0;
  EndPos=0;
  CurPos=79;
  EndPosSw=1;
end
always @(posedge InReady)
begin
  LocData1={InData, Temp, InData, Temp, InData, Temp, InData, Temp,
            InData, Temp, InData, Temp, InData, Temp, InData, Temp};
end
always @(negedge InReady) Temp<=InData;
always @(LocData1)
begin
  if(DeMuxCounter)
  begin
    DeMuxCounter<=DeMuxCounter+1;
    {aLD0, aLD1, aLD2, aLD3, aLD4}<={aLD1, aLD2, aLD3, aLD4, LocData1};
    NextStep<=0;
  end
  else
  begin
    DeMuxCounter<=DeMuxCounter+1;
    {aLD0, aLD1, aLD2, aLD3, aLD4} <= {aLD1, aLD2, aLD3, aLD4, LocData1};
    {LD0, LD1, LD2, LD3, LD4} <= {aLD1, aLD2, aLD3, aLD4, LocData1};
    NextStep<=1;
  end
end
always @(posedge NextStep)
begin
  begin
    G39<=H38; G38<=H37; G37<=H36; G36<=H35; G35<=H34; G34<=H33; G33<=H32; G32<=H31; G31<=H30; G30<=H29;
    G29<=H28; G28<=H27; G27<=H26; G26<=H25; G25<=H24; G24<=H23; G23<=H22; G22<=H21; G21<=H20; G20<=H19;
    G19<=H18; G18<=H17; G17<=H16; G16<=H15; G15<=H14; G14<=H13; G13<=H12; G12<=H11; G11<=H10; G10<=H09;
    G09<=H08; G08<=H07; G07<=H06; G06<=H05; G05<=H04; G04<=H03; G03<=H02; G02<=H01; G01<=H00;
    if(NadoT)
    begin
      RingData=H39;
      NadoT=NadoT-1;
      if((BeginPos&1023)==0) TimeStep]=CurTime;
      BeginPos=BeginPos+1;
    end
  end
 
  begin
    CurTime<=CurTime+1;
  
    H39<=G39; H38<=G38; H37<=G37; H36<=G36; H35<=G35; H34<=G34; H33<=G33; H32<=G32; H31<=G31; H30<=G30;
    H29<=G29; H28<=G28; H27<=G27; H26<=G26; H25<=G25; H24<=G24; H23<=G23; H22<=G22; H21<=G21; H20<=G20;
    H19<=G19; H18<=G18; H17<=G17; H16<=G16; H15<=G15; H14<=G14; H13<=G13; H12<=G12; H11<=G11; H10<=G10;
    H09<=G09; H08<=G08; H07<=G07; H06<=G06; H05<=G05; H04<=G04; H03<=G03; H02<=G02; H01<=G01;
    H00<={LD1, LD2, LD3, LD4};
    if(LD1>=2000 && LD2>=2000 && LD3>=2000 && LD4>=2000)
    begin
      NadoT<=(NadoT&1023)+3072;
    end
  end
end
always @(posedge OutClock)
begin
  {CurShiftData, OutData}=CurShiftData;
  CurPos=CurPos-1;
  if(CurPos==0)
  begin
    CurPos=79;
    CurShiftData=EndPos;
    if((EndPos&1023)==0 && EndPosSw)
    begin
      CurShiftData=TimeStep];
      EndPosSw=0;
    end
    else
    begin
      CurShiftData=RingData;
      EndPosSw=1;
      EndPos=EndPos+1;
    end
  end
end
endmodule

6 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I found a construct in your code, that's absolutely not synthesizable. You can't built a counter without a clock (a posedge or negedge condition).

    always @(LocData1)
    begin
      if(DeMuxCounter)
      begin
        DeMuxCounter<=DeMuxCounter+1;

    You should check, what you want to achieve here and find a clear synchronous construct for it. I also noticed ripple clocks in the design that may prevent timing closure.

    You mentioned a input clock speed of 400 MHz. Do you mean that InReady is 400 MHz or 200 MHz?

    I don't see at once an asynchronous read of RingData. I wonder, if it has to do with usage of blocking assignments. Altera RAM interference examples are exclusively using non-blocking assignments, according to it's synchronous function. Or you have removed the problem when simplyfying the code.

    In any case, without forcing RAM inference for the large buffer structure, the design can't compile I fear.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Dear FvM,

    thank you for your kind suggestions. I tried to rewrite everything according to your suggestions and got my project compiled, however I am still not sure that everything is ok.

    I got very impressive Fmax counts for several clocks of my design: 438MHz for InReady and 153MHz for the internal clock which is 4 times demux of InReady, so, I need at least 100MHz. Indeed I have two designs, one is running with InReady clocked at 200MHz and one other slightly differ with more wider reg [27:0] InData running at 400MHz clock.

    What I cannot understand right now in my Quartus compilation, is the following: I allocate reg [63:0] RingData [0:8191], so, it is 512KBits, however, in "Flow Summary" I used zero bits.

    Would enybody comment me where my RingData and TimeStep arrays was allocated? In RTL it is marked as sync_ram, but if the internal Cyclone 3 memory of 608K were used, why the Flow Summary has zero bits of usage.

    I am attaching complete code of this project, RTL, Fmax and Flow summaries.

    Sincerely,

    Ilghiz

    
    module My_First_Project (InData, InReady, OutClock, OutData);
    parameter MaxK=8;
    parameter BLKSIZE=1024;
    parameter MaxN=MaxK*BLKSIZE;
    input   InData;
    input  InReady, OutClock;
    output OutData;
    reg    OutData;
    reg signed  LD0, LD1, LD2, LD3, LD4;
    reg signed  PE, NE;
    reg  DeMuxCounter;
    reg  H00, H01, H02, H03, H04, H05, H06, H07, H08, H09;
    reg  H10, H11, H12, H13, H14, H15, H16, H17, H18, H19;
    reg  H20, H21, H22, H23, H24, H25, H26, H27, H28, H29;
    reg  H30, H31, H32, H33, H34, H35, H36, H37, H38, H39;
    reg  SumS1, SumS2;
    reg  LevelS1, LevelS2;
    reg signed  x1, x2, x3, x4;
    reg  y1, y2, y3, y4, yy1, yy2, yyy, LS1, LS2;
    reg  z1, z2, z3, z4, zz1, zz2, zzz;
    reg  NadoT;
    reg  CurTime;
    reg  RingData ;
    reg  TimeStep ;
    reg  BeginPos;
    reg  EndPos;
    reg  CurShiftData;
    reg   CurPos;
    reg        EndPosSw;
    initial
    begin
      LD4<=0;
      CurTime=0;
      SumS1=1;
      SumS2=1;
      LevelS1=1;
      LevelS2=1;
      NadoT=0;
      DeMuxCounter=0;
      BeginPos=0;
      EndPos=0;
      CurPos=79;
      EndPosSw=1;
    end
    always @(posedge InReady)
    begin
      PE <= {PE, InData};
    end
    always @(negedge InReady)
    begin
      NE <= {NE, InData};
      DeMuxCounter<=DeMuxCounter+1'b1;
    end
    always @(posedge DeMuxCounter)
    begin
      if(NadoT)
      begin
        RingData=H39;
        NadoT=NadoT-1'b1;
        if(BeginPos==0) TimeStep]=CurTime;
        BeginPos=BeginPos+1'b1;
      end
      begin
        LD0<=LD4;
        LD1<={PE, NE, PE, NE, PE, NE, PE, NE,
              PE, NE, PE, NE, PE, NE, PE, NE};
        LD2<={PE, NE, PE, NE, PE, NE, PE, NE,
              PE, NE, PE, NE, PE, NE, PE, NE};
        LD3<={PE, NE, PE, NE, PE, NE, PE, NE,
              PE, NE, PE, NE, PE, NE, PE, NE};
        LD4<={PE, NE, PE, NE, PE, NE, PE, NE,
              PE, NE, PE, NE, PE, NE, PE, NE};
      end
      begin
        {H00, H01, H02, H03, H04, H05, H06, H07, H08, H09,
         H10, H11, H12, H13, H14, H15, H16, H17, H18, H19,
         H20, H21, H22, H23, H24, H25, H26, H27, H28, H29,
         H30, H31, H32, H33, H34, H35, H36, H37, H38, H39} <=
        {LD1, LD2, LD3, LD4,
         H00, H01, H02, H03, H04, H05, H06, H07, H08, H09,
         H10, H11, H12, H13, H14, H15, H16, H17, H18, H19,
         H20, H21, H22, H23, H24, H25, H26, H27, H28, H29,
         H30, H31, H32, H33, H34, H35, H36, H37, H38};
        x1<=LD0-LD1;
        x2<=LD1-LD2;
        x3<=LD2-LD3;
        x4<=LD3-LD4;
        y1<=LD1*LD1;
        y2<=LD2*LD2;
        y3<=LD3*LD3;
        y4<=LD4*LD4;
     
        SumS1<=SumS1-(SumS1>>3);
        SumS2<=SumS2-(SumS2>>3);
     
        LS1<=LevelS1-(LevelS1>>3);
        LS2<=LevelS2-(LevelS2>>3);
        CurTime<=CurTime+1;
      end
     
      begin
        z1<=x1*x1;
        z2<=x2*x2;
        z3<=x3*x3;
        z4<=x4*x4;
        yy1<=y1+y2;
        yy2<=y3+y4;
      end
     
      begin
        zz1<=z1+z2;
        zz2<=z3+z4;
        yyy<=yy1+yy2;
      end
     
      begin
        zzz<=zz1+zz2;
        SumS1<=SumS1+(yyy>>5);
      end
      begin
        SumS2<=SumS2+(zzz>>5);
      end
     
      begin
        if(SumS1>=LevelS1 && SumS2>=LevelS2)
        begin
          LevelS1<=LS1+(SumS1>>5);
          LevelS2<=LS2+(SumS2>>5);
          NadoT<=NadoT+3072;
        end
      end
    end
    always @(OutClock)
    begin
      begin
        {CurShiftData, OutData}<=CurShiftData;
      end
      if(CurPos) CurPos<=CurPos-1'b1;
      else
      begin
        CurPos<=79;
        CurShiftData<=EndPos;
        if(EndPos==0 && EndPosSw)
        begin
          CurShiftData=TimeStep];
          EndPosSw=0;
        end
        else
        begin
          CurShiftData=RingData;
          EndPosSw=1;
          EndPos=EndPos+1'b1;
        end
      end
    end
    endmodule
    
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    You have several issues in the below code:

    - it's combinational (no edge sensitive condition)

    - CurShiftData is the RAM output register and a shift register at the same time. That doesn't work.

    Because I don't know what you actually wan to achieve here, I can't suggest a solution.

    always @(OutClock)
    begin
      begin
        {CurShiftData, OutData}<=CurShiftData;
      end
      if(CurPos) CurPos<=CurPos-1'b1;
      else
      begin
        CurPos<=79;
        CurShiftData<=EndPos;
        if(EndPos==0 && EndPosSw)
        begin
          CurShiftData=TimeStep];
          EndPosSw=0;
        end
        else
        begin
          CurShiftData=RingData;
          EndPosSw=1;
          EndPos=EndPos+1'b1;
        end
      end
    end
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Dear FvM,

    thank you for your reply. Actually, I really cannot guess what to do, I am new in FPGA... I need to implement the following algorithm:

    on posetive or on negative edges I should send one bit from 80 bit array CurShiftdata;

    if there is no data available on CurShiftData, I need to generate CurShiftData according to the following rules:

    [79:75] are zeros,

    [76:64] bits corresponds to EndPos, (I am using [79:64]<=EndPos am I right?)

    the rest [63:0] 64 bit data are collected 1/1024 times from TimeStep array, and on other cases from RingData.

    Actually, I have no filling what your comment about RAM output register means. Yes, I understand that it is shift register, and probably it is by some means also RAM, but when it is RAM or not, I cannot define myself, so I cannot figure out how to fix this problem. Please, help me!

    Sincerely,

    Ilghiz

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    As a general remark, your code is rather complex for a "First_Project". I hope, you're able to solve the involved problems without too much frustration. Most people start learning HDL programming with more basic design problems.

    --- Quote Start ---

    on positive or on negative edges I should send one bit

    --- Quote End ---

    I understand now, why you wrote always @(outclock), but unfortunately, it's not synthesizable. You need a kind of DDIO (dual-data-rate) output register. I present a principle solution in the code snippet below (registering two bits and use a multiplexer to select the right output data bit for both clock phases), for high OutClock speeds, explicite instantiation of a DDIO primitive may be required.

    The other point is to keep the requirements for RAM inference. I'm showing below a construct that is accepted by the Quartus compiler, but I'm not sure if it's acceptable to register the RAM output 1 clock cycle in advance. If it doesn't work this way, you have to use a different construct, that reserves one clock cycle delay for the RAM read action.

    always @(posedge OutClock)
    begin
      begin
        {CurShiftData, OutData_n,OutData_p}<=CurShiftData;
      end
      if(CurPos)
      begin 
        CurPos=CurPos-1'b1;
      end
      else
      begin
        CurPos<=79;
        CurShiftData<=EndPos;
        if(EndPos==0 && EndPosSw)
        begin
          EndPosSw<=0;
          CurShiftData<=CurShiftData_s1;
        end
        else
        begin
          EndPosSw<=1;
          EndPos<=EndPos+1'b1;
          CurShiftData<=CurShiftData_s2;
        end
      end
      CurShiftData_s1<=TimeStep];
      CurShiftData_s2<=RingData;
    end
    assign OutData = (OutClock)?OutData_p:OutData_n;
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Dear FvM,

    thank you very much for your kind suggestion. Due to your help and helps in other forums I was able to rewrite this example such a way that it compile and looks reasonable in RTL.

    In regards to "My_Frist_Project" it is really my first project, I never used Verilog/VHDL or other synthes languages before, however, it is really simple algorithm regarding to my goal - QR like algorithm. Hope that my experience in numerical mathematics helps me to implement it fast enough :)

    Sincerely,

    Ilghiz