Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
11 years ago

IOWR/IORD latency

Hi all!

Sorry for prvious empty thread, I have no idea what happened.

I'm experiencing some strange latency when using IOWR macros. I've added added custom 8-bit slave to QSYS and got huge number of cycles to read/write its registers. I thought that this issue related to some mistakes in my peripherial but then I've tried to read on-chip memory and got the same result!

Here is the code, I'm using performance counter:

int main() { 
PERF_RESET(PERFORMANCE_COUNTER_0_BASE);
PERF_START_MEASURING(PERFORMANCE_COUNTER_0_BASE);
PERF_BEGIN(PERFORMANCE_COUNTER_0_BASE,1);
IORD_8DIRECT(ONCHIP_MEMORY2_0_BASE, PRER_LO);
PERF_END(PERFORMANCE_COUNTER_0_BASE,1);
PERF_BEGIN(PERFORMANCE_COUNTER_0_BASE,2);
IORD_16DIRECT(ONCHIP_MEMORY2_0_BASE, PRER_LO);
PERF_END(PERFORMANCE_COUNTER_0_BASE,2);
PERF_BEGIN(PERFORMANCE_COUNTER_0_BASE,3);
IORD_32DIRECT(ONCHIP_MEMORY2_0_BASE, PRER_LO);
PERF_END(PERFORMANCE_COUNTER_0_BASE,3);
perf_print_formatted_report(PERFORMANCE_COUNTER_0_BASE,50000000,3,"IORD_8","IORD_16","IORD32");
return 0;
}

And what I get:

--Performance Counter Report--
Total Time : 10 usec (532 clock-cycles)
+---------------+-----+------------+---------------+------------+
| Section       |  %  | Time (usec)|  Time (clocks)|Occurrences |
+---------------+-----+------------+---------------+------------+
|         IORD_8|   9 |          1 |            51 |         1  |
+---------------+-----+------------+---------------+------------+
|        IORD_16|   9 |          1 |            50 |         1  |
+---------------+-----+------------+---------------+------------+
|         IORD32|   8 |          0 |            47 |         1  |
+---------------+-----+------------+---------------+------------+

Ok, timer adds some time to this, as I measured, 30 clock cycles. So we have about 20 clock cycles per word, still bad. What could I do wrong?

I'm using Quartus 15.0 Web Edition and Nios 2 Gen 2 /e.

2 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    You will probably find that the performance counters represent considerable overhead. Try reading the registers 10,000 times in one performance counter. Then divide the time by 10,000 to get a result that isn't lost in noise.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Nios 2 /e is a slow processor that required 5 clock cycles minimum to complete 1 instruction.

    The overhead from the performance counter is larger in comparison to the total measured time per IORD instruction (30/50). I agree with Galfonz that you should perform iterations of IORD for more accuracy.

    Also try to look at simulations. This would definitely be a better way to understand the behavior of the RTL.