If you stick with FIR filters, the solution which is most straightforward is to use the altera FIR compiler. Generate a filter with reloadable coefficients and you'l get a port through which you can load coefficients in a serial manner (one coefficient at a time) This hooks to avalon very easily with a PIO if you dont have time critical reload constraints. Otherwise you could use the arlut_fifo_interface to connect to the port, and you will be able to write directly to the port without handshaking your way through loading. arlut_fifo_interface is published here on nios forum. Filters with more than 100 taps and 100MHz is posible. If you want lower order, simply load excess coeffients to 0, or build a new block with fir compiler. During reload you will have to discard data comming from the filter.
With fir compiler you can play with different filter structures to get a good match between resources and performance. One issue important for resource usage is the clock latency. One sample per clock would require the most logic cells.
I have not used altera IIR compiler, so dont know about reload there.
Async ram with nios: I would guess its as simple as adding one of the standard extern ram blocks in sopc builder. They are async ones, used in most of the kits. If not, use a avalon tristate bridge.