I meant to say that the 'ra' and 'rb' bits (readra and readrb) are ignored.
Think of what happens during the 'Decode' pipeline phase:
- opcode bits 31-27 read register file (M9K) port 'a'.
- opcode bits 26-22 read register file port 'b' (dual ported reads)
- D phase stall is detected (write pending to either register [1]).
Now we have three 32bit values which are fed into all the ALU functions during the 'Execute' pipeline phase (including the combinatorial custom instructions), all will generate their result based on the 96 input bits.
The opcode bits 5-0 (opcode) and bits 13-6 (custom code) act as a big 'mux on the result of all the instruction logic and a 'write-back' flag (bit 14 for custom) these are latched for writing to the register file next clock [2].
[1] Careful inspection of the opcode table shows that a stall on the A read is needed for everything except 'call' and 'jmpi' [3], and on the B read if the opcode bits 0 and 1 differ (bit 2 set would be less logic!). I really can't believe there is also check dependant on the custom opcode value.
[2] A write then would miss the next instructions, I suspect there is a two entry fifo with a fast-path into the decode phase of the next instructions.
(The write can be done in the same clock as two reads.)
[3] Quite a few instructions will actually read register 0 - hopefully there isn't a write pending! I've not tried writing to R0!