There really is not a hard answer to this. It depends a lot on how your code is written and the peripheral that you are interfacing to. The NIOS II processor reference indicates that any load or store insruction should take 1 clock cycle if it does not involve an Avalon transfer but >1 if it does involve an avalon transfer. Not very informative.
My personal experience with reading data from an avalon peripheral and storing the data into an array in memory while viewing the transfers on the oscilloscope is 3 clocks per transfer. But again this depends on the implementation.