Hello, Is it possible to use a custom instruction from an user application running on NIOS2MMU - uCLinux. I'm trying to run the CRC design example on uCLinux. When I try to compile the software application I get "macros undefined error" which I thought would be a part of the cross compilers standard header files. Creation of BSP which in turn generates "System.h" file will fail as the BSP tool doesn't support NIOS2 with MMU. Is there anyway to achieve this? Thanks, Chetan

Since a BSP is not used, you have to write the macros for the custom instructions yourself or use the builtins, which should be defined by the cross compiler. For example: /* Opcode for the byteswap custom instruction provided by Altera. * This may change if other custom instructions are present. */ # define ALT_CI_BYTESWAP_N 0x00 # define ALT_CI_BYTESWAP(x) __builtin_custom_ini(ALT_CI_BYTESWAP_N, (x))

Ykozlov, Thanks for the reply. The kernel compiled fine after manually defining the macros. Chetan

Do you really want to use the custom instructions in the Kernel, not just in a userland application ? -Michael

Michael, I meant to say the user application compiled without any errors after defining the macros. I'm not using custom instruction in the kernel, but only in the user application. Thanks, Chetan

FWIW, if you are doing CRC16 (the usual one for hdlc comms) then a custom instruction for the following C can be used: static __inline__ uint32_t crc_step(uint32_t crc, uint32_t byte_val) { uint32_t t = crc ^ (byte_val & 0xff); t = (t ^ t << 4) & 0xff; return crc >> 8 ^ t << 8 ^ t << 3 ^ t >> 4; }The 4 levels of xor easily execute in a single clock. My notes suggest that the above C compiles to 11 instructions, and a lookup table version to 7 (with the table base in a global register).

Using custom instruction from uCLinux user app on nios2mmu

16 Replies

Altera_Forum
Honored Contributor
14 years ago
I meant to say that the 'ra' and 'rb' bits (readra and readrb) are ignored.

Think of what happens during the 'Decode' pipeline phase:
- opcode bits 31-27 read register file (M9K) port 'a'.
- opcode bits 26-22 read register file port 'b' (dual ported reads)
- D phase stall is detected (write pending to either register [1]).

Now we have three 32bit values which are fed into all the ALU functions during the 'Execute' pipeline phase (including the combinatorial custom instructions), all will generate their result based on the 96 input bits.
The opcode bits 5-0 (opcode) and bits 13-6 (custom code) act as a big 'mux on the result of all the instruction logic and a 'write-back' flag (bit 14 for custom) these are latched for writing to the register file next clock [2].

[1] Careful inspection of the opcode table shows that a stall on the A read is needed for everything except 'call' and 'jmpi' [3], and on the B read if the opcode bits 0 and 1 differ (bit 2 set would be less logic!). I really can't believe there is also check dependant on the custom opcode value.

[2] A write then would miss the next instructions, I suspect there is a two entry fifo with a fast-path into the decode phase of the next instructions.
(The write can be done in the same clock as two reads.)

[3] Quite a few instructions will actually read register 0 - hopefully there isn't a write pending! I've not tried writing to R0!
Altera_Forum
Honored Contributor
14 years ago
Hi,

Did you hack Nios2 core?:D

--- Quote Start ---
I meant to say that the 'ra' and 'rb' bits (readra and readrb) are ignored.

Think of what happens during the 'Decode' pipeline phase:
- opcode bits 31-27 read register file (M9K) port 'a'.
- opcode bits 26-22 read register file port 'b' (dual ported reads)
- D phase stall is detected (write pending to either register [1]).

Now we have three 32bit values which are fed into all the ALU functions during the 'Execute' pipeline phase (including the combinatorial custom instructions), all will generate their result based on the 96 input bits.
The opcode bits 5-0 (opcode) and bits 13-6 (custom code) act as a big 'mux on the result of all the instruction logic and a 'write-back' flag (bit 14 for custom) these are latched for writing to the register file next clock [2].

--- Quote End ---

The Nios2/f CPU pipeline is composed of 6 stages like

Fetch -- Decode -- Execute -- Memory -- Align -- Write Back.

Each instruction must get operands before it enters the 'Execute' stage. But the register files are made from 'Embedded Memories', so I think that the register files are always read from the 'Fetch' stage even for the instructions which do NOT need operand values, because the Embedded Memory needs 1 clock for its read (and write) access.

Kazu
Altera_Forum
Honored Contributor
14 years ago
The 'Fetch' cycle reads the opcode word from memory, the register values must be read in the following cycle - the 'Decode' cycle - in order to be available during 'Execute'.
This read will be unconditional, the only question is the actual condition(s) for a 'D' phase stall (ie a re-execute for the same opcode word).
Altera_Forum
Honored Contributor
14 years ago
AFAIK, an instruction can use registers that are modified by the previous one (at least for calculations, maybe when used as an address in a load or store instruction this might cause a hazard and thus stall the pipeline).

So there seems to be a shortcut for the register values and they don't need to be physically saved before being read by the next instruction.

-Michael
Altera_Forum
Honored Contributor
14 years ago
Yes, I think the results of the combinatorial ALU block are written into a 2? entry fifo along with the register number as well as being written to the register file itself.
The values from this fifo take precidence over the values read from the register file itself.
This makes the values from single cycle ALU instructions available in the following instruction.
The results of load and potentially multi-cycle instructions are not fed into this fifo - so force pipeline stalls (it is possible that the results aren't ready early enough in the clock cycle to do this without significantly reducing fmax).
Load and store instructions are always fully synchronous - they both wait for the Avalon bus transfer to complete. I'm sure the bus interface could trivially do a single async write (would give an asyc fault on error). Async read is somewhat harder - a pipeline stall would be needed to do the delayed write to the register file.
Possibly they could have done non-delayed reads from tightly coupled data memory - after all the memory read of 'rA + imm16' can be scheduled unconditionally for all tightly coupled data blocks.
Altera_Forum
Honored Contributor
14 years ago
Hi,

--- Quote Start ---

This read will be unconditional, the only question is the actual condition(s) for a 'D' phase stall (ie a re-execute for the same opcode word).
--- Quote End ---

Of course, the 'D' phase stall is evoked in the case of 'Data Hazard'.

--- Quote Start ---
AFAIK, an instruction can use registers that are modified by the previous one (at least for calculations, maybe when used as an address in a load or store instruction this might cause a hazard and thus stall the pipeline).
-Michael
--- Quote End ---

And of course, Nios2/f core has the 'forwarding mechanism'. May be, those paths are from the output of 'Execute', 'Align', and 'Write Back' stage, and I think (may be) the 'Memory' stage doesn't have one for the sake of simplicity, because the load instruction needs at least 2 clocks when the core uses the data cache and only the 'Align' stage can make stall after 'Execution' stage (this means that the 'Memory' stage is a dummy stage for simple instructions, for example, add or sub). May be the Nios2/f has 'Score Board' algorithm and the latest value is supplied from forwarding paths when the target operand is existing in these stage, so I think the 'D' phase stall is evoked when the next instruction needs the result of memory read or the result of 'Memory' stage.

Kazu

Forum Discussion

Using custom instruction from uCLinux user app on nios2mmu

16 Replies

Recent Discussions

Nios-V on Cyclone IV

Nios II IDE File Name too long error

No valid license for Nios processor

Correct way to use mSGDMA with a NIOSV/m processor on a MAX10 FPGA

NIOS SDK SBOM/FOSS info