If you were hoping to see the results of the DSPs in your code every cycle, you won't. There are all kinds of latencies depending on which NIOS core you use and whether you have interrupts or not. Your best bet would be to implement a larger part of your algorithm in HDL and hopefully it would produce results (and consume input) at a slower rate; slow enough that the software latencies are not a factor. In this case, you would use a DMA-like scheme... i.e. 1- load a bunch of data into memory (preferably FPGA RAM as suggested above), 2- FPGA does all the calculations... 3- DMA back to CPU RAM and use the results.