Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
14 years ago

Using custom instruction from uCLinux user app on nios2mmu

Hello,

Is it possible to use a custom instruction from an user application running on NIOS2MMU - uCLinux. I'm trying to run the CRC design example on uCLinux. When I try to compile the software application I get "macros undefined error" which I thought would be a part of the cross compilers standard header files. Creation of BSP which in turn generates "System.h" file will fail as the BSP tool doesn't support NIOS2 with MMU.

Is there anyway to achieve this?

Thanks,

Chetan

16 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I meant to say that the 'ra' and 'rb' bits (readra and readrb) are ignored.

    Think of what happens during the 'Decode' pipeline phase:

    - opcode bits 31-27 read register file (M9K) port 'a'.

    - opcode bits 26-22 read register file port 'b' (dual ported reads)

    - D phase stall is detected (write pending to either register [1]).

    Now we have three 32bit values which are fed into all the ALU functions during the 'Execute' pipeline phase (including the combinatorial custom instructions), all will generate their result based on the 96 input bits.

    The opcode bits 5-0 (opcode) and bits 13-6 (custom code) act as a big 'mux on the result of all the instruction logic and a 'write-back' flag (bit 14 for custom) these are latched for writing to the register file next clock [2].

    [1] Careful inspection of the opcode table shows that a stall on the A read is needed for everything except 'call' and 'jmpi' [3], and on the B read if the opcode bits 0 and 1 differ (bit 2 set would be less logic!). I really can't believe there is also check dependant on the custom opcode value.

    [2] A write then would miss the next instructions, I suspect there is a two entry fifo with a fast-path into the decode phase of the next instructions.

    (The write can be done in the same clock as two reads.)

    [3] Quite a few instructions will actually read register 0 - hopefully there isn't a write pending! I've not tried writing to R0!
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi,

    Did you hack Nios2 core?:D

    --- Quote Start ---

    I meant to say that the 'ra' and 'rb' bits (readra and readrb) are ignored.

    Think of what happens during the 'Decode' pipeline phase:

    - opcode bits 31-27 read register file (M9K) port 'a'.

    - opcode bits 26-22 read register file port 'b' (dual ported reads)

    - D phase stall is detected (write pending to either register [1]).

    Now we have three 32bit values which are fed into all the ALU functions during the 'Execute' pipeline phase (including the combinatorial custom instructions), all will generate their result based on the 96 input bits.

    The opcode bits 5-0 (opcode) and bits 13-6 (custom code) act as a big 'mux on the result of all the instruction logic and a 'write-back' flag (bit 14 for custom) these are latched for writing to the register file next clock [2].

    --- Quote End ---

    The Nios2/f CPU pipeline is composed of 6 stages like

    Fetch -- Decode -- Execute -- Memory -- Align -- Write Back.

    Each instruction must get operands before it enters the 'Execute' stage. But the register files are made from 'Embedded Memories', so I think that the register files are always read from the 'Fetch' stage even for the instructions which do NOT need operand values, because the Embedded Memory needs 1 clock for its read (and write) access.

    Kazu
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The 'Fetch' cycle reads the opcode word from memory, the register values must be read in the following cycle - the 'Decode' cycle - in order to be available during 'Execute'.

    This read will be unconditional, the only question is the actual condition(s) for a 'D' phase stall (ie a re-execute for the same opcode word).
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    AFAIK, an instruction can use registers that are modified by the previous one (at least for calculations, maybe when used as an address in a load or store instruction this might cause a hazard and thus stall the pipeline).

    So there seems to be a shortcut for the register values and they don't need to be physically saved before being read by the next instruction.

    -Michael
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Yes, I think the results of the combinatorial ALU block are written into a 2? entry fifo along with the register number as well as being written to the register file itself.

    The values from this fifo take precidence over the values read from the register file itself.

    This makes the values from single cycle ALU instructions available in the following instruction.

    The results of load and potentially multi-cycle instructions are not fed into this fifo - so force pipeline stalls (it is possible that the results aren't ready early enough in the clock cycle to do this without significantly reducing fmax).

    Load and store instructions are always fully synchronous - they both wait for the Avalon bus transfer to complete. I'm sure the bus interface could trivially do a single async write (would give an asyc fault on error). Async read is somewhat harder - a pipeline stall would be needed to do the delayed write to the register file.

    Possibly they could have done non-delayed reads from tightly coupled data memory - after all the memory read of 'rA + imm16' can be scheduled unconditionally for all tightly coupled data blocks.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi,

    --- Quote Start ---

    This read will be unconditional, the only question is the actual condition(s) for a 'D' phase stall (ie a re-execute for the same opcode word).

    --- Quote End ---

    Of course, the 'D' phase stall is evoked in the case of 'Data Hazard'.

    --- Quote Start ---

    AFAIK, an instruction can use registers that are modified by the previous one (at least for calculations, maybe when used as an address in a load or store instruction this might cause a hazard and thus stall the pipeline).

    -Michael

    --- Quote End ---

    And of course, Nios2/f core has the 'forwarding mechanism'. May be, those paths are from the output of 'Execute', 'Align', and 'Write Back' stage, and I think (may be) the 'Memory' stage doesn't have one for the sake of simplicity, because the load instruction needs at least 2 clocks when the core uses the data cache and only the 'Align' stage can make stall after 'Execution' stage (this means that the 'Memory' stage is a dummy stage for simple instructions, for example, add or sub). May be the Nios2/f has 'Score Board' algorithm and the latest value is supplied from forwarding paths when the target operand is existing in these stage, so I think the 'D' phase stall is evoked when the next instruction needs the result of memory read or the result of 'Memory' stage.

    Kazu