As you realized, PIO is definitely a very inefficient way to transfer data.
A common solution is wrapping your code into an Avalon MM slave interface and then using dma to transfer data. In this way your cpu only needs to initiate the dma transfer upon reception of the udp packet, then it is free for other task while the transfer is going on.
If you don't like a MM slave you can implement in your Verilog a serial interface (e.g. spi) and again use dma to transfer data.
However, I guess you now use PIO as a parallel bus with 8bit data and a write signal; then the MM slave solution is quite straightforward with minimal effort.