Without profiling it's not possible to give a certain answer but yes I think so. Packing lots of small packets with software takes some time and will reduce the bandwith. If you want haigh bandwidth while keeping the same solution you need to pack several packets together. With the SGDMA you can even configure the DMA to automatically pick the different fragments where they are in memory and assemble them, reducing the CPU usage and again increasing the bandwidth. But as you say it will increase the latency. You have to choose between latency and bandwidth.
Another solution to have both a high bandwidth and a low latency would be to to the whole Ethernet to USB conversion in hardware instead, and don't use at all the Nios CPU or the DMAs. But it's more work.