With a hardware based FFT you would normally use a DMA to move the samples from the SRAM into the FFT and then back out to memory. With a software implementation you would just have the software access the data directly.
In this zip file is a file called sw_only_fft.c which is a radix 2 FFT that can be run on Nios II:
http://www.altera.com/literature/tt/c2h_tutorial.zip It by no means is optimized but if you google search around you'll find it's very similar to other code fragments floating around out there. If you study the code you'll find out how many operations are performed per block of input data which should give you a rough idea how long it'll take to run on a Nios II core.