NIOS II Custom Instruction access to CPU Regs'

Question

Hi,  I&amp;#39;m looking for a solution to access internal CPU Register via a custom instruction. I didn&amp;#39;t found anything about this in NIOS II documentation.  My problem is following :   The interrupt responce of NIOS II is very slow  (NIOS "e-core" at 50MHz access to IRQ0 at about 12Âµs;  NIOS  "f-core" at 50MHz access to IRQ0 at about 3Âµs)   so I&amp;#39;d try to speed up the interrupt service routines from altera. In this case I&amp;#39;d like to save the CPU Regs into shadow register via a custom instruction. Is this possible?   Thanks for help,

altera_forum · Answer

Custom instructions are coupled to the ALU so the best you could do is save a couple of them (2-3 actually). Custom instructions are meant for receiving 2-3 inputs and outputting a single result back to the NIOS core. So in short no this would not be possible.

I'm curious, what is the hard deadline that your ISR needs to process? Also what memory are you using for code and ISR code? (you would want your "fast" code to be in fast memory (on chip memory)). So I would look into that, and also if you can operate your NIOS core faster you may want to do that as well (if you are using Stratix you can easily clear 100MHz).

Good-luck

altera_forum · Answer

Hi,

I placed the service routine in On Chip memory. But it is the same. I'm wonderring why it took so long, because I counted the assembler commands for alt_irq_entry.h and alt_irq_handler. I calculated an interrupt responce time of about 1,5Âµs. (target system is NIOS II f core at 50MHz) And I'd like to know why my system need so much time. Did you have any expierience on interrupt service request time on NIOS II ? We only like to know, how much time an interrupt need on NIOS II system.

Thanks

altera_forum · Answer

The interrupt routine is slow because most of the time the code itself doesn't come out of the instruction cache. And if it is, it is for sure not worst case timing.

Storing and recovering the registers is also slow. Bit can be faster if data cache is available, and because generally interrupts are small and use small data sets, the register contents will still be in the datacache. Do not think internal RAM is a solution for everything. I did some small benchmarks, and it seems to take 3 or more cycles to read a value, around 2 to store one. SO it is not single cycle (if I'm wrong, please let me know).

Also to keep in mind : if interupts are not reenabled, only the first interrupt that comes in will be handled fast. The priority mechanisme is not working in this case (at least not for worst case timing).

So some tips :

1. Do assembler (also for calculating the offset in the function table and to call the function).

2. Figure out when it is possible to reenable interrupts. It takes some time to reenable and disable, but it can save a lot on worst case latency.

3. Use an inner loop (in between register sving and recovering) to do lower priority interrupts that became active when the higher priority interrupt was handled. (So less overhead for saving and restoring, but still the chance for high priority interrupts to come in).

4. Be carefull whith the fast core and data cache. See what can eb cahched or not. Also there seems to be a strange issue when reenabling interrupts and using the fast core. If you ever see some strange effects that go away when the data cache is disabled, let me know, then I know I'm not the only one with this issue, and maybe altera will listen then.

Stefaan.

altera_forum · Answer

Hi Stefaan,

thanks for your information. That seems to be strange I thought if I am using the F core it would take only 1 cycle to store and load data from internal ram. In the SOPC designer the internal ram is created with no wait states, so i thougth I only have to look at "fundamental slave read transfers" of avalon bus -> so it is 1 cycle. The same is for the write transfer (Avalon Bus Specification Reference Manual pages 27 and 37). Did I missunderstood something ?

Holger

altera_forum · Answer

Holger,

Look at the processor documentation, paga 190 of the reference handbook (pdf page nr).

load and store with avalon transfer : >1 , so can be 100 without violating the spec. (So even with a single cycle RAM access, and no other peripherals accessing the bus at that moment (DMA transfers, instruction load, ...), it is more than one according to the spec. I don't think Altera will put '>' in the documentation if there is a chance to have a '>='.

load and store without avalon transfer = 1 --> so if cache is containing the data, it should be 1 cycle, this has nothing to do with ineternal or external ram. If an interrupt occurs, the caches will normally not contain the addresses of where you want to store the registers, and if thay contain them (e.g. when you just left a function with a big stack frame, and the addresses on the stack are reused for the interrupt stack frame), you can not count on it for maximum values.

Stefaan

Forum Discussion

NIOS II Custom Instruction access to CPU Regs'

6 Replies

Recent Discussions

Multiple NIOS V Implementation

not able to use multiple niosV cores at the same time

Nios V/m JTAG run‑control HALT fails — Debug Module healthy, hart never halts

SysID Timestamp

Implementing many Nios® V cores on Agilex™ 7