Forum Discussion
Altera_Forum
Honored Contributor
17 years agoDash25,
You can not make custom instructions with 3 inputs like you descibe it for a MAC. What you need to do is to work with 2 custom instructions. One will access the accumulator register, the other will add the multiple of the two operands and add it to the result. Best performance you will have if you register the multiply output before adding. I'm not good in VHDL (because I never use it), but in verilog it should look as follow :module custom(
input a,
input b,
output reg result, //unclocked
input n,
input start,
input clk, clk_e, reset,
output reg done //unclocked
);
parameter ACC = 0;
parameter MULT = 1;
reg accum;
reg mult;
reg l_start;
always @(posedge clk or posedge reset)
if (reset)
begin
accum <= 0;
l_start <= 0;
end
else
begin
l_start <= start;
if (start)
case (n):
ACC : accum <= 0;
MULT : mult <= a*b;
//possible to add others
endcase
if (l_start)
case (n):
//ACC : nothing to do
MULT : accum = accum + mult;
endcase
end
//the result output
always @*
case (n)
ACC : result = accum;
//MULT : not interesting
default : result = 0;
endcase
//control done behaviour
always @*
case (n)
//ACC : handled with default case;
MULT : done = l_start; //delay one cycle, because calculation still busy
default : done = 1;
endcase
endmodule When you call custum instruction 0, the accumulator result is read out, and the accumulator is reset. When you call custom instruction 1, the multiple of a and b is added to the accumulator. The a and b inputs are seen as UNSIGNED by this code! You can make easy to use inline functions for the custom instruction, or use the IDE provided ones... inline unsigned long GetAndResetMAC()
{
unsigned long retval;
__asm__ ("custom 0, %0, r0, r0" : "=r" (retval));
return retval;
}
inline void MAC(unsigned short a, unsigned short b)
{
__asm__ ("custom 1, r0, %1, %2", :: "r" (a), "r" (b));
}
Notes : - I didn't test the code (and so give no warrant for the correctness), and I don't have the template for the signal names for a custom instructions at hand, something can be missing, you'll figure out what I mean. - with the n, you can make 256 custom instructions, so it can be extended. - the "start" signal is the kick-off for the instruction, the processor waits for a done high. That's why the done for the multiply instruction is delayed (l_start) (the pipleine stage). - there is some excellent documentation on this by Altera. I hope this helps Stefaan