Can you use a profiler? It should indicate what part of your code is using the most CPU and should help you find what to optimize.
Another way to do that is to comment away some part of the functionality and see if you get any signoficate speed increase. If yes it means the bottleneck is in the commented code. Do you also concatenate together several ethernet frames before sending them on USB? If not and if there is a significant overhead when sending through USB it could be it.
How are you sending to USB? Are you using a DMA that reads from the same buffer the ethernet frame was received in? If your software copies data around then it can reduce the bandwidth significantly.