This is starting to sound like using a PIO would be a bad idea. If for example the memory holding the data becomes blocking you might have an idle cycle between accesses to the PIO. Sounds more like FIFOs would be your best best with the DMA operating faster than the FIFOs can be drained to protect against underflow.
While you are looking at DMAs you might want to evaluate this one:
http://www.alterawiki.com/wiki/modular_sgdma It's a SGDMA only with a much simplier programming model (to software it looks like the regular DMA with a FIFO buffering descriptors internally).