Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
17 years ago

custom instructions implementation

Hi everybody,

I'm trying to implement a mac unit as custom instruction into the Nios2 to multiply 2 operands and add another. I had no problems describing it in vhdl. Unfortunately I'm stuck now at several other problems.

1. In sopc builder after adding the mac component to the cpu and generating the .ptf file, i build a syslib in Nios2ide using that .ptf.

At the beginning of the building process it always says

--- Quote Start ---

WARNING: module cpu_II_Mac_inst (Mac) not found in component directory (install.ptf)

--- Quote End ---

What does that mean? What should I add to the install.ptf or the component directory (which is where?) to help him find the module?

2. I need 3 operands - the system.h created anyway by the Nios2ide only lists a macro with 2 operands even if I've set the operands of the custom_instruction_slave to 3 in the component editor

--- Quote Start ---

#define ALT_CI_MAC_INST(A,B) __builtin_custom_inii(ALT_CI_MAC_INST_N,(A),(B))

--- Quote End ---

3. If I edit that macro in the system.h file to support 3 integer operands

--- Quote Start ---

#define ALT_CI_MAC_INST(A,B,c) __builtin_custom_iniii(ALT_CI_MAC_INST_N,(A),(B),(c))

--- Quote End ---

I get an undefined reference to that function. How do I add the functionality for a third operand? Where are the built-in functions declared and where is the function body, so that I may add a third operand to that function manually?

How do I tell the compiler to run these function calls on the hardware described by the hdl-file?

I would deeply appreciate any help on this topic because I'm about to get desperate on it.

Cheers,

Dash

4 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Dash,

    1. I don't know

    2. Maybe related to your VHDL description. If it is not a big secret post it, or send me a message.

    3. The compiler itself will never use your custom instructions to optimise code. Because the compiler is not build with this information. You have to change and rebuild the compiler for that (what you propably don't want to do).

    The only code that will use the custom instruction is if you call the macro.

    (there is an exception for flating point instructions, where altera build support for it into the compiler, it is however a job for specialists to do that).

    Stefaan
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Stefaan,

    thank you for your reply. I haven't been at home over the weekend so that I wasn't able to write here.

    My vhdl code isn't that complex to explain the ide's behaviour.

    I'll show you:

    --- Quote Start ---

    library IEEE;

    use IEEE.STD_LOGIC_1164.ALL;

    use IEEE.STD_LOGIC_ARITH.ALL;

    use IEEE.STD_LOGIC_UNSIGNED.ALL;

    library WORK;

    entity mac is

    Port ( A : in STD_LOGIC_VECTOR(15 downto 0);

    B : in STD_LOGIC_VECTOR(15 downto 0);

    C : in STD_LOGIC_VECTOR(31 downto 0);

    Q : out STD_LOGIC_VECTOR(31 downto 0);

    clk : in STD_LOGIC);

    end mac;

    architecture Behavioral of mac is

    signal Qs : STD_LOGIC_VECTOR(31 downto 0);

    begin

    MAC : Process (CLK)

    Begin

    If CLK'event and CLK = '1' Then

    Qs <= A*B;

    End If;

    End Process;

    Q <= Qs + C;

    end Behavioral;

    --- Quote End ---

    I guess I'll just try around a little. Maybe I'm lucky to find sth.

    Thanks anyway again for the help.

    Cheers,

    Dash
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Dash25,

    You can not make custom instructions with 3 inputs like you descibe it for a MAC.

    What you need to do is to work with 2 custom instructions.

    One will access the accumulator register, the other will add the multiple of the two operands and add it to the result. Best performance you will have if you register the multiply output before adding.

    I'm not good in VHDL (because I never use it), but in verilog it should look as follow :

    module custom(
      input  a, 
      input  b,  
      output reg  result, //unclocked
      input  n, 
      input start, 
      input clk, clk_e, reset,  
      output reg done  //unclocked
      );
    parameter ACC  = 0;
    parameter MULT = 1;  
      
    reg  accum;
    reg  mult;
    reg l_start;
    always @(posedge clk or posedge reset)
    	if (reset)
    		begin
    			accum   <= 0;
    			l_start <= 0;
    		end
    	else
    		begin
    			l_start <= start;
    			if (start)
    				case (n):
    					ACC  : accum <= 0;
    					MULT : mult  <= a*b;
    					//possible to add others
    				endcase
    			if (l_start)
    				case (n):
    					//ACC : nothing to do
    					MULT : accum = accum + mult;
    				endcase
    		end
    				
    //the result output				
    always @*
    	case (n)
    		ACC     : result = accum;
    		//MULT    : not interesting
    		default : result = 0;
    	endcase
    	
    //control done behaviour 
    always @*
    	case (n)
    		//ACC : handled with default case;
    		MULT : done = l_start; //delay one cycle, because calculation still busy
    		default : done = 1;
    	endcase
    	
    				
    endmodule

    When you call custum instruction 0, the accumulator result is read out, and the accumulator is reset.

    When you call custom instruction 1, the multiple of a and b is added to the accumulator.

    The a and b inputs are seen as UNSIGNED by this code!

    You can make easy to use inline functions for the custom instruction, or use the IDE provided ones...

    inline unsigned long GetAndResetMAC()
    {
        unsigned long retval;
        __asm__ ("custom 0, %0, r0, r0" : "=r" (retval)); 
        return retval;
    }
    inline void MAC(unsigned short a, unsigned short b)
    {
        __asm__ ("custom 1, r0, %1, %2", :: "r" (a), "r" (b));
    }
    

    Notes :

    - I didn't test the code (and so give no warrant for the correctness), and I don't have the template for the signal names for a custom instructions at hand, something can be missing, you'll figure out what I mean.

    - with the n, you can make 256 custom instructions, so it can be extended.

    - the "start" signal is the kick-off for the instruction, the processor waits for a done high. That's why the done for the multiply instruction is delayed (l_start) (the pipleine stage).

    - there is some excellent documentation on this by Altera.

    I hope this helps

    Stefaan
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi,

    Now, as you would put this instruction in the Nios processor?

    If you have an example to send me, please post to the forum

    Thanks you!