Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
9 years ago

Optimal ALU

Dear all,

I'm creating an ALU for a homemade NIOS II compatible processor with Quartus Prime, targeting the Cyclone V. I discovered that, depending on the operations ordering, the ALU has a different maximal frequency. For example, for an ALU supporting four operations (addition, substraction, logical OR, and logical AND), the fastest design is the one that implements them in the following order:

Index

Operation

00

ADD

01

SUB

10

OR

11

AND

I've tested all combinations; something that can't reasonably be done for bigger designs. Is there a way to find the optimal operation ordering without having to try every possible combination?

Thanks in advance!

8 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The ALU's operation is selected by a signal. In the above example, the ALU performs addition when that signal is equal to 0; substraction when equal to 1… By "operation ordering" I meant the mapping between ALU's control signal and the operations: which operation ALU's perform when that control signal equals XXX.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Ok, but I fail to see how the exact encoding value for each operation matters, given the way FPGAs work.

    Are you using a prioritized case statement to generate the logic, or is it a purely logical operation?

    Without seeing the code you write it is hard to make any substantial comment on the result you see.

    Examples (in verilog):

    reg [7:0] a;

    reg [7:0] b;

    reg [7:0] s;

    reg [1:0] f;

    // unordered:

    s = ({8{f==0}} & (a+b)) | ({8{f==1}} & (a-b)) | ({8{f==2}} & (a|b)) | ({8{f==3}} & (a&b));

    case (f)

    0: s = a+b;

    1: s = a-b;

    2: s = a|b;

    3: s = a&b;

    endcase

    // prioritized/ordered:

    s = (f == 0) ? (a+b) : ((f == 1) ? (a-b) : (((f == 2) ? (a|b) : (a&b))));

    if (f == 0) s = a+b;

    else if (f == 1) s = a-b;

    else if (f == 2 ) s = a|b;

    else s = a&b;
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    Unordered!

    --- Quote End ---

    Then this does not make sense to me. If indeed your implementation is written as 'unordered' (as my example shows above) then the exact encoding of which operation is selected by code 0,1,2,3 should make no difference, as the LAB logic can select any of the bit patterns equally easily.

    So you need to provide a lot more detail on what you are implementing (example code of yours) and what the fitting results are, as what you have reported so far is too general.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The following SystemVerilog code has been compiled with Quartus Prime 16.0.0, targeting Cyclone V 5CGXFC5C6F27C7. Inputs are synchronous.

    module Alu(input logic  operand1, operand2, input logic  control, input logic clock, output logic result);
        always_ff @(posedge clock) begin
            case (control)
                'h0: result <= operand1 + operand2;
                'h1: result <= operand1 - operand2;
                'h2: result <= operand1 & operand2;
                'h3: result <= operand1 | operand2;
            endcase
        end
    endmodule

    For this ALU, there is a difference of a few MHz for an average frequency of 800MHz between the fastest and the slowest design. However, for a bigger ALU implementing all NIOS II's arithmetic (except multiply and divide) and logical operations, the fastest design I found is clocked at 200MHz while the slowest runs at 150MHz… That's huge, and just by changing operations order!
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Did you read Joysyb's document?

    If you use timing requirements to specify the frequency at which your system should work (and if you don't, you should definitely start with that) then all combinations should be able to reach the same target frequency.

    If you just look at the fmax then even a tiny change in your code will give different results. With its default settings Quartus will just try to reach the target frequency and will stop optimizing once it reaches it.