Is there a performance difference between variables vs direct assignments

Several years ago I wrote a small (I thought) 8-bit micro-controller core called the Open8. The current model is based very closely on the old Arc-lite/V8 instruction set, though I've modified a few instructions to be more useful. It's been a handy micro for doing packet generation, programmable state machines, etc. However, the fMax of designs using it has always been low in the target devices (usually a Cyclone III)

I observed that most of the timing issues were related to the ALU, specifically where the instruction decoder was setting up the ALU, so I set about to pipeline that code a bit more heavily. In the process, I moved from a two-process model, where the combinatorial logic was in a separate process, to a one-process model. For most designs, this seemed to perk up the synthesis results nicely, and I did get a nice speed boost, but I'm wondering if I have created a new limitation in the process.

To avoid redesigning everything from scratch, I opted to use variables where I had previously used signals between two separate processes. So, for example, instead of using Cache_Ctrl <= CACHE_PREFETCH; in the separate combinatorial process, I used Cache_Ctrl := CACHE_PREFETCH in the single clocked process. As an aside, I initialize all variables to their most common value at the top of the code, and modify it as needed in each state. Below the FSM code is the code that manages these subsystems. This works fine, and the processor core is running in hardware.

To be clear, this is what I am doing:


  CPU_Proc: process( Clock, Reset )
    variable IC              : CACHE_MODES;
...
  begin
    if( Reset = Reset_Level )then
...
    elsif( rising_edge(Clock) )then
      IC                     := CACHE_IDLE;
...
      case( CPU.State )is
... (snippet from within primary FSM)
-------------------------------------------------------------------------------
-- Program Control (BR0_C1, BR1_C1, DBNZ_C1, JMP )
-------------------------------------------------------------------------------
        when BRN_C1 =>
          if( CPU.Flags(Reg) = CPU.Opcode(0) )then
            IC               := CACHE_IDLE;
            PC.Offset        := CPU.Operand1;
            CPU.State        <= PIPE_FILL_0;
          else
            IC               := CACHE_INSTR;
            CPU.State        <= INSTR_DECODE;
          end if;
...
-------------------------------------------------------------------------------
-- Instruction/Operand caching for pipelined memory access
-------------------------------------------------------------------------------
      case IC is
        when CACHE_INSTR =>
          CPU.Opcode         <= Rd_Data(7 downto 3);
          CPU.SubOp_p0       <= Rd_Data(2 downto 0);
          CPU.SubOp_p1       <= Rd_Data(2 downto 0) + 1;
          if( CPU.Cache_Valid = '1' )then
            CPU.Opcode       <= CPU.Prefetch(7 downto 3);
            CPU.SubOp_p0     <= CPU.Prefetch(2 downto 0);
            CPU.SubOp_p1     <= CPU.Prefetch(2 downto 0) + 1;
            CPU.Cache_Valid  <= '0';
          end if;
        when CACHE_OPER1 =>
          CPU.Operand1       <= Rd_Data;
        when CACHE_OPER2 =>
          CPU.Operand2       <= Rd_Data;
        when CACHE_PREFETCH =>
          CPU.Prefetch       <= Rd_Data;
          CPU.Cache_Valid    <= '1';
        when CACHE_INVALIDATE =>
          CPU.Cache_Valid    <= '0';
        when CACHE_IDLE =>
          null;
      end case;
...
  end if;
end process;

I'm just not clear on whether or not this coding style may be causing synthesis issues.

Thanks for any advice!

-Seth

13 Replies

Altera_Forum
Honored Contributor
12 years ago
Remember, whether you use variables or signals, that it will result in combinatorial logic or registers. The less logic between registers will increase the fmax. Variables vs signals is mostly irrelavent, because its more a question of how you use those variables. Signal assignments in a clocked process will always produce a register. Variables in a clocked process can produce a register if used in the correct way, but they can also produce combinatorial logic if used another way. Variables have instant assignment, so if they are assigned a value and then "read" further down the same bit of code, they will produce logic:

--clocked process a := ip0 and ip1; op <= a;

variable a here will produce logic with the "op" assignment producing the register (latency 1). but if connected like this:

--clocked process op <= a; a := ip0 and ip1;

a will produce a register AND op will produce a register (latency 2), because a is read before it is updated.

I notice in your code the "IC" variable is assigned a value and then put into a case statement. If IC was a signal assignemnt or assigned after the case statement, it would increase the pipelining. Its really difficult to tell without the rest of the code what else is happening, but generally, unless you know exactly what it's supposed to produce (ie. the underlying hardware) you're better off sticking with signals and avoiding variables.
Altera_Forum
Honored Contributor
12 years ago
In this case, I actually want combinatorial logic,not registers. IOW, I need the logic that is set by these variables to complete on the current clock, so the variables are being used the same way signals between processes were being used in the older model. (I actually have gone to a great deal of trouble to make sure that none of these variables inadvertently imply a flip-flop.) The variables were simply to help with maintainability, so that rather than have to actually copy the bits of code into each state where they are used, the code is written once and effectively "called" from the states where they are used. IOW, I could have just as easily written it so that every time I need to prefetch the next instruction, I just do a CPU.Prefetch <= Rd_Data; instead.

I believe it should produce identical netlists, but I know that sometimes the synthesizer can have issues with certain code models. The question had more to do with how Quartus treats variables used in this manner, so I may have posted in the wrong area.
Altera_Forum
Honored Contributor
12 years ago
quartus is quite good with VHDL behaviour. so using variables or signals in an unclocked process will make no difference. The problem is you have too much logic.
Altera_Forum
Honored Contributor
12 years ago
Not so much a problem, as the processor model works fine as it stands. I have shipping designs running with it at 66MHz with margin to spare. I am looking at doing some newer designs where I would like to target 100MHz, though, and I was having trouble meeting that with the current model. Assuming no more bugs show up in regression testing, my new model is hitting 112MHz in the same design. The critical path is now largely between the address bus and the ROM, though there are still some low margin figures in the ALU core; but since I'm making timing, I'm not sure it's worth the pain of continuing to optimize the model.

My biggest issue now is deciding how badly the increased instruction latency is going to cost me. Math operations that used to stream through in a single clock cycle are now taking three clock cycles to complete because I am having to back up the program counter to account for the additional clock cycle in the ALU path.

I just wanted to make sure I wasn't throwing anything away by using variables in a single clocked process instead of the more traditional signals between a clocked and combinatorial process.
Altera_Forum
Honored Contributor
12 years ago
Like I said, its nothing to do with signals or variables - its behaviour, and how that behaviour maps to logic.
With good pipelining, on a cyclone 3 200-250MHz should be possible, so an extra couple of clocks of latency could still end up completing faster at that speed than fewer cycles at a slower clock. Thats where you have to work out the trade off.
Altera_Forum
Honored Contributor
12 years ago
You may want to check the synthesis results in the 'RTL Viewer' (Tools -> Netlist Viewers -> RTL Viewer). It will show whether the variables have inferred registers and whether it is what you wanted.
Altera_Forum
Honored Contributor
12 years ago
As pointed out, any difference between variable and signal is not matter of which is better but the final inferred logic/registers could be different and functionality may change.

Here is my view of variable Versus signal away from process update mindset (needs some work to prove it).
Example 1:

(i) count <= count + 1;
count is inferred as register. An adder of 1 plus this register's output is wired back into its input
(ii) count := count + 1;
count is never a register but a register is needed after it since its previous value need be stored but
this register is not known by name count to the tool
The final inferred structure of above two cases i&ii of counting is identical:
Example 2:
(i)
D <= A;
A := B + C;
A is never a register (just output of adder B + C
D is register on its own and acquires old value of A.
Hence there would be a register after A but not known as A to the tool. Thus D is a second register after the register put on A
B + C => A => register => register(D)
(ii)
A := B + C;
D <= A;
D acquires new value of A; Thus D is register after A directly
B + C => A => register(D)

So in short a variable allows combinatorial logic within clocked process that may or may not infer a register
Altera_Forum
Honored Contributor
12 years ago
FWIW the nios cpu uses the 'clock enable' signal to tightly coupled memory blocks in order to keep the old output even though the address has changed.
This means that when the pipeline stalls it doesn't have to regenerate any addresses.
Altera_Forum
Honored Contributor
12 years ago
--- Quote Start ---
needs some work to prove it
--- Quote End ---

Wouldn't you do that first then, before possibly confusing the audience?
Altera_Forum
Honored Contributor
12 years ago
--- Quote Start ---
Wouldn't you do that first then, before possibly confusing the audience?
--- Quote End ---

just done and it is as I described in both cases of counters and signals A,B,C,D

I don't think we always have time to prove what we think is right before submitting it to the forum or do you????

Forum Discussion

Is there a performance difference between variables vs direct assignments

13 Replies

Recent Discussions

Quartus did not start

The quartus license works with version 25.0 but not with version 17.0

Docker image for Quartus Pro 26.1 missing ?

Timing analysis - long combinational path

timing violation fix