Altera_Forum
Honored Contributor
16 years agoWhy is the speed of operations not constant? Is the cache the reason?
Hi
First of all I'm using Quartus 8.1 and the NiosII IDE 8.1. The Nios II system I set up, processes data from an ADC with the help of a FFT module I added to the processor system. The output of the FFT module is read by a DMA controller, which copies the FFT results to an internal buffer (512*32bit). This FFT output data consists of 16bit real +16bit imaginary. The goal is to calculate the magnitude of 4 bundles of data by using the real and imaginary bins. How do I do this? 1. input the first bundle of data to the FFT module 2. readout the FFT module with the DMA 3. calculate magnitude of all bins 4. repeat these steps for the other 3 bundles of data The entire source code runs from an external SRAM which requires 30ns per access (3 cycles in my system, running with 100MHz) Here the source code of the magnitude calculation of one bundle of data: for(SampleCounter_u16 = 0; SampleCounter_u16 < 512; SampleCounter_u16++) { /* extract 16bit real and imaginary from 32bit value */ Real_s16 = (int16_t)(samples_u32sa[SampleCounter_u16] & 0xFFFF); Imag_s16 = (int16_t)((samples_u32sa[SampleCounter_u16] >> 16)&0xFFFF);/* typecast with custom instruction */ Real_f32 = CI_INTTOFLOAT(Real_s16); Imag_f32 = CI_INTTOFLOAT(Imag_s16); /* calculate magnitude and scale */ Result1_f32a[SampleCounter_u16] = CI_FPSQRT(Real_f32*Real_f32 + Imag_f32*Imag_f32)/STEP_SIZE_1MILIG; } The problem is that the execution of this source code requires different time durations. I perform this loop 4 times to calculate the magnitude of all 4 data bundles and the overall time of each loop is different (measured with the timestamp timer). Next step was to measure the speed of each line of code. E.g. the code "Real_f32 = CI_INTTOFLOAT(Real_s16);" should have a constant execution speed, but it hasn't. The execution speed of the operation during the one of the four loops is higher than during the other 3 three loops. e.g. during the first, second and fourth loop the operation takes 20ns every 512 times the operation is executed. During the third loop it takes 100ns during each of the 512 repeatitions. This is very strange. First I through the timer could be the reason, but than I changed from the NiosII-f to the NiosII-e processor and the phenomenon disappeared. With the NiosII-s and -f this strange behaviour occurs. I assume the caching of the NiosII-s and -f is the reason for this, but I'm not sure. Has anybody an idea about this?