Hello i want to make my design as fast as possible to run uCLinux, my software is running at 40ms and should run at 8ms. I am using a linux with MMU, 50mhz clock, 32k Data and instruction Cache, My TCMemory has 1KB. In the make menuconfig i've selected ENABLE MUL INSTRUCTION (is mulx better?) What can i do to get a better performace? Any tips? Thanks

You can run the NIOS much faster that 50 MHz, We are running in a Cyclone 3 at 96 MHz with no problems and have had it running as high as 168 MHz. That would be the first change. The other is figure out what is the longest computational path, and if you can, add a hardware engine that does that specific function. IE if you are spending most of your time computing an FFT, add a Hardware FFT and just pass it the data and look at the results. This will require a "Software/Hardware" interface, and won't usually be just plug and play, so you'll have to know how to work in Verilog or VHDL to make it work, but it can give you significant improvements. Pete

Make sure you have compiled everything with -O2 or -O3. Look at the generated code for your critical loops and check the compiler has made a reasonable job of it - and isn't spilling locals to the stack all the time, if it is change the C to give it a better chance.

How can i compile with -O2 or -O3 using MMU Linux DSL? And i have a sdram on my design (DE2-70 board) and it's running at 100mhz while my NIOS is at 50mhz. If i try to raise my NIOS clock i get either failed to verify, 00ps kernel not syncing or when i type nios-2terminal nothing shows error. I also tried to add a float point unit at Qsys and enable hardware divide but it didnt made a big difference. My code do mainly float point operations and some i/o

I've tried -O3 and -O2 flags and the result isn't really good also (these flags are only at my application Makefile) Well to be honest the only thing that did make a difference was changing the data/instruction cache from 32 to 64k.. I really think i am not doing something correctly when compiling my linux image

I'd have thought you'd be able to run at 100MHz, maybe something is wrong with the clock sources when you are trying to do that. Getting the clocking right is tricky (I'm no expert), ISTR some magic -3.3ns value and also using the memory clock as the master clock... If increasing the cache sizes makes a significant difference, then you must be thrashing the caches - probably worth determining whether it is the data or instruction cache. Floating point will be slow. If you've got the fpga real-estate the fp custom instructions will help float (but not double) operations. One option is to convert your floating point to fixed point - then use integer operations. For that to work well you'll really want the mulx instructions (which seem to be only available with DSP multipliers), and maybe a custom instruction to extract the required 32bits from the 64bit product. Thinks - would the 32x32 full adder array execute in a single clock to perform a multiply (throwing gates at it!).

Achieving higher performance? | Altera Community

21 Replies

Altera_Forum
Honored Contributor
14 years ago
You can run the NIOS much faster that 50 MHz, We are running in a Cyclone 3 at 96 MHz with no problems and have had it running as high as 168 MHz.

That would be the first change.

The other is figure out what is the longest computational path, and if you can, add a hardware engine that does that specific function.

IE if you are spending most of your time computing an FFT, add a Hardware FFT and just pass it the data and look at the results.

This will require a "Software/Hardware" interface, and won't usually be just plug and play, so you'll have to know how to work in Verilog or VHDL to make it work, but it can give you significant improvements.

Pete
Altera_Forum
Honored Contributor
14 years ago
Make sure you have compiled everything with -O2 or -O3.
Look at the generated code for your critical loops and check the compiler has made a reasonable job of it - and isn't spilling locals to the stack all the time, if it is change the C to give it a better chance.
Altera_Forum
Honored Contributor
14 years ago
How can i compile with -O2 or -O3 using MMU Linux DSL?

And i have a sdram on my design (DE2-70 board) and it's running at 100mhz while my NIOS is at 50mhz.
If i try to raise my NIOS clock i get either failed to verify, 00ps kernel not syncing or when i type nios-2terminal nothing shows error.

I also tried to add a float point unit at Qsys and enable hardware divide but it didnt made a big difference.

My code do mainly float point operations and some i/o
Altera_Forum
Honored Contributor
14 years ago
I've tried -O3 and -O2 flags and the result isn't really good also (these flags are only at my application Makefile)

Well to be honest the only thing that did make a difference was changing the data/instruction cache from 32 to 64k..

I really think i am not doing something correctly when compiling my linux image
Altera_Forum
Honored Contributor
14 years ago
I'd have thought you'd be able to run at 100MHz, maybe something is wrong with the clock sources when you are trying to do that.
Getting the clocking right is tricky (I'm no expert), ISTR some magic -3.3ns value and also using the memory clock as the master clock...
If increasing the cache sizes makes a significant difference, then you must be thrashing the caches - probably worth determining whether it is the data or instruction cache.
Floating point will be slow. If you've got the fpga real-estate the fp custom instructions will help float (but not double) operations.
One option is to convert your floating point to fixed point - then use integer operations. For that to work well you'll really want the mulx instructions (which seem to be only available with DSP multipliers), and maybe a custom instruction to extract the required 32bits from the 64bit product.
Thinks - would the 32x32 full adder array execute in a single clock to perform a multiply (throwing gates at it!).
Altera_Forum
Honored Contributor
14 years ago
My clock configuration at the QSys is correct. The SDRAM clock is -67 degrees from my sys_clock and it is running at a supported speed (took a look at the datasheet)

I made this SDC file:
create_clock -period 20.000 -name clkin_50 derive_pll_clocks derive_clock_uncertainty
Altera_Forum
Honored Contributor
14 years ago
I took a look and my fmax is only 76mhz using a CIV (de2-115)
this is bad.. maybe is because of my design size?
1,549,000 memory bits / 20kLE?

I have two processors, a phy, a lot of onchip mem..
Altera_Forum
Honored Contributor
14 years ago
I removed everything related to the second processor and my FMAX went from 80 to 115.

I used the flags of custom instruction @alterawiki and my time went from 26ms to 7ms. Really awesome.
Altera_Forum
Honored Contributor
14 years ago
I'm surprised the 2nd processor made that much difference to fmax. Maybe your device was getting full so some long tracks were being used.
Altera_Forum
Honored Contributor
14 years ago
Yeah i am surprised also..
How can i do some pipeline bridge to improve my fmax even more?

Forum Discussion

Achieving higher performance?

21 Replies

Recent Discussions

NiosV µC/OS-II

AshlingRISCFree IDE Build system: 'source directory does not appear to contain CMakeLists.txt"

Recommended Quartus Prime Standard Edition for Nios V Development on MAX 10 FPGA (10M25DAF4817G)

Nios-V on Cyclone IV

Debug Know-How: Ashling* RiscFree* NIOS® V debug using Command Line