Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
15 years ago

NiosII is slow? Or something I am doing wrong.

Hi all,

I am playing with Cyclone II chip. It is custom board and there is ATMEGA16 running at 12MHz and FPGA with master clock 50MHz.

FPGA is translating data from internal dualport memory via LVDS to other board with lots of RGB LEDs.

Just for fun and for learning purposes I wrote small program for Mandelbrot fractal calculation in C. First of all, I compiled it with GNU C for ATMEGA. When I copy-pasted same code to Nios II.

The difference between tests:

a) when runing from ATMEGA, CPU clock is 12MHz, data passed to FPGA via 4 wires + control line. Both nibles are joined in FPGA and RAM adr is calculated in hardware.

b) when runing from NIOS II, CPU clock is 50MHz, data passed to RAM in 8 wires, RAM adr is passed via 12 wires directly. No need for calculation. Nios II is configured with hardware float multiplication. All RAM is on-chip.

Guess who is faster?

I didn't made exact measurement. But BOTH system looks like runing on same speed! :eek:

What is wrong?

Testing software:

#include

"main.h"

static

float zeta;

 

void

pushbyte(char x, int adr)

{

IOWR_ALTERA_AVALON_PIO_DATA(RAMADR_BASE,adr);

IOWR_ALTERA_AVALON_PIO_DATA(PORTB_BASE, x);

IOWR_ALTERA_AVALON_PIO_DATA(PORTA_BASE, 1);

IOWR_ALTERA_AVALON_PIO_DATA(PORTA_BASE, 0);

}

 

 

void

pumprgb(void)

{

int adr;

float cx, cy, scale, a1, b1, a2, b2, ax, ay;

signedint x,y;

long color;

float a12, b12;

int limit;

int lp;

cx=-1.52;

cy=0;

scale=0.05001-(zeta*0.002);

limit=4;

if (zeta>128) {zeta=0;} //sitas netelpa! 100 baitu reikia.

 

y=-12;

while

(y<12)

{

x=-40;

ay=cy+y*scale;

while(x<40)

{

ax=cx+x*scale;

&#12288;

b1=ay;

a1=ax;

a12=a1*a1;

b12=b1*b1;

lp=0;

while ((lp<255) && ((a12+b12)<limit))

{

lp++;

a12=a1*a1;

b12=b1*b1;

&#12288;

a2=a12-b12+ax;

b2=2*a1*b1+ay;

a1=a2;

b1=b2;

}

color=lp;

color=color*45536;

adr=((int) ((x+40)*3+((y+12)*240))) & 0x1FFF;

//adr=adr & 0x1FFF;

pushbyte((color>>16),adr); // red

pushbyte((color>>8),adr+1);

// green

pushbyte(color,adr+2);

//blue

x++;

}

y++;

}

zeta++;

}

&#12288;

int

main(void)

{

zeta=0;

while(1)

{

pumpRGB();

}

return(0);

}

14 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    It is very interesting post. Especially about FP comparisons.

    For AVR I am using default settings and compiler from winavr package (http://winavr.sourceforge.net/) - no fancy libraries, no hand optimization.

    Have you compiled Nios version with hardware floating code?

    My compiled code size is about 2K bytes for nios.

    Maybe Nios is not slow, but C compiller is bad? :)
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The hand optimized libm.a is part of the winavr package. How big was your compiled AVR code? Using libm.a my result was about 1.5k. Without libm.a the size was almost 4k.

    I did compile Nios with hardware floating point, the code was just under 2k.

    I don't think the main problem is the C compiler, but more the C library and lack of hardware for more FP operations. The default floating point implementations provided in the C library are not very efficient and not optimized for Nios. Even with hardware floating point turned on, this only accelerates the basic math functions (+, -, *, /), not things such as comparisons or conversions to or from integers.

    I did come across this alternate custom floating point unit (http://www.nioswiki.com/custom_floating_point_unit). I suspect it could make a big difference for your test program.

    I am kinda surprised that the Altera FP unit does not include acceleration for more FP operations.

    Of course, if you really want to see the FPGA shine, you should create application specific acceleration. I stumbled across a site about an fpga based mandelbrot generator (http://markbowers.org/home/fpga-mandelbrot) you may find interesting.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Yup Levas, Kevin is right. It all comes down to a true apples to apples comparison. Otherwise, you're deluding yourself...and others. Be sure that you're including all details, not just the ones you're familiar with...

    The Mandelbrot site is interesting. There's a comparable one (http://www.altera.com/support/examples/nios2/exm-c2h-mandelbrot.html) on Altera's site. In this example, the Nios II (with C2H generated hardware acceleration) blows away most any normal CPU, including the one on your desktop PC.

    That's the real strength of using softcore CPUs in FPGAs. Don't like your current performance? Well, because you're now in the squishy software/hardware world of FPGAs, you just gave yourself something beyond hand optimized assembly to reach your goals.

    Cheers,

    --slacker
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Is you app being compiled with -O2 as well as the BSP/libc bits?

    I gave up looking for the relevant config option and hacked the makefile.