You probably have your work cut out for you just getting started, and toward that end going simple with e.g. an Avalon-MM Slave interface which your NIOS software will use to write the 32-bit registers one at a time is probably simplest. As dsl said, if it's academic you can be satisfied in knowing that you could always make it faster if you chose.
Once you get it working, the performance you will achieve will depend on how much work (complication) you want to invest. As Daixiwen already noted, the 32-bit nature of the NIOS is a significant bottleneck. Although SGDMA is a bit better, you are probably on the right track thinking about Avalon-MM Master interfaces with larger bus width. For example, you could implement bursting master with 64/128/256-bit width to stream operands (and opcodes, if you like) from SDRAM. Ideally, limit the NIOS to control activity only.
If you only have a handful of operands you want to use (but frequently), then you could possibly look into using dual port onchip memory, with the NIOS/SGDMA on one port for reading/writing results, and the other port dedicated for your Avalon-MM Master to use. If it will fit in your device, a RAM width of 1024-bits might provide the highest throughput.
As far as how to control your new IP, you could do something like add custom instructions which take 32-bit addresses (pointers) to the operands which your new logic would independently fetch/store results.