Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
12 years ago

Tightly Coupled OnChip DMA

Hi,

I've got some very strange 'performance behavior' with a tightly coupled on chip memory used as data storage. In order to speed up my design, I've created a tightly coupled on chip memory where I calculate coefficient values while a DMA controller copies new data into another portion.

At my first attempt I've used the DMA synchronized by means that I wait for the dma to finish copying and then doing my calculations.

In my second attempt I copy while I do my calculations, but this gives me zero performance gain. How can that be? It almost seems as if the NIOS stalls while copying takes place.

Can anyone give me a good explanation on what's going on?

Thanks

11 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The code I run on the nios is carefully compiled without any actual function calls (they are all inlined) in order to give the compiler more registers.

    All global data is accessed using 16bit offsets from the global pointer - this also significantly reduces pressure on registers.

    You do get better code for global arrays if you use %gp as a register variable pointing to the array (and for global structs if you have built gcc with my patches).

    I've also disabled the dynamic branch prediction logic to get guaranteed branch timings.

    With code and data in tightly coupled memory the measured timing then match the calculated ones.

    I only found one undocumented pipeline stall - there is a 1 cycle stall for a read following a write to the same tightly coupled data memory.