Are your four descriptors chained? When is your callback called? Is it when the chain is completed, or for each descriptor completion?
To make most use of the SGDMA the secret is to be sure it is always kept busy, and that is always has at least one usable descriptor. If you wait until it has parked at the end of the chain before you set a new one then you are loosing some time. I've never used the callback mechanism but I've used my own ISR, but I think this is possible with callbacks too.
Often the problem with low throughput isn't the SGDMA itself but the software that controls it. Be sure that you do as little processing as possible when handling the SGDMa interrupts, and especially use pre-allocated memory buffers. A malloc() call is very costly.
Be sure also that your processing of the packet data isn't the bottleneck. Obviously it should last less time than the interval you have between two packets.