You should definitely be able to have a higher sample rate than that. You could try a longer chain if you can handle several buffers at the same time. What Nios CPU are you using and at what frequency? Do you compile your code with optimization? (-O2)
Do you call the do_async_transfer() function as soon as possible in the callback? To get the highest throughput it is important to do it with the lowest delay possible. What your callback should do is first to look for a new buffer for the next transfer, set up the DMA for the async transfer, and only then process the received packet. That way the DMA can begin loading the next packet while you process the current one.
If you have enough onchip RAM in your FPGA it can also be a good idea to use it for your network packets instead of main RAM. You can even use a dual port one with one port connected to the DMA and the other one to the NIOS CPU.