The IP stacks only use interrupts. Polling usually uses too much CPU resources. For higher performance some stacks have a high priority thread that is only responsible for receiving the packet after an interrupt and configuring the DMA again for the next one, while a lower priority thread does the actual processing on the packet.
It's difficult to tell without the code but from your description it looks like you have too many processing on the CPU side that is causing the packet drop. Use a profiler to find where the bottleneck is.
Do you have to do much software processing on the ethernet frames? If it is just simple encapsulation, you'll get a better performance by doing everything in hardware instead of going through a DMA and a software stack. I've never used USB cores, but if yours has some Avalon Stream interfaces it shouldn't be too complicated to set up.