Forum Discussion
Altera_Forum
Honored Contributor
16 years agoIf your system runs at 100MHz, then you will have a clock cycle of 10ns.
you say that the 1. & 2. and 4th take 20ns = 2 clocks for all 512 iterations but the 3. takes 100ns = 10 cycles for all 512 loop steps. if some interrupt or refresh cycle or whatsoever occcurs then you would measure this additional time within all 4 loops and not only in loop 3. this is indeed strange have you setup signaltap and monitored what is giong on on each of the 4 loop cycles ? especially those custum instructions. another question you wrote Real_f32 = CI_INTTOFLOAT(Real_s16); Imag_f32 = CI_INTTOFLOAT(Imag_s16); Result1_f32a[SampleCounter_u16] = CI_FPSQRT(Real_f32*Real_f32 + Imag_f32*Imag_f32)/STEP_SIZE_1MILIG; so you have 2 float multiplikation and one float addition why don't you write Result1_f32a[SampleCounter_u16] = CI_FPSQRT( CI_INTTOFLOAT( Real_s16 * Real_s16 + Imag_s16 * Imag_s16 ) )/STEP_SIZE_1MILIG; now you have 4 integer multiplikation and one integer addition, this should be a bit faster as also only one CI_INTTOFLOAT operation is needed.