--- Quote Start ---
There is an option in the MAC to add/remove two extra null bytes at the beginning of the packet.
--- Quote End ---
I'm aware of this option, under Linux there is a constant NET_IP_ALIGN set to either 0 or (more typically) 2 which determines whether the network stack expects the extra 2 bytes or not. On mipsel it is 2, so my driver sets RX_SHIFT16 and TX_SHIFT16 and adjusts the DMA pointers as it should. Everything here works as expected. (The existing altera_tse.c and atse.c drivers seem to completely misunderstand it)
The main alignment issue is because my system busses and DMA are 64-bit wide, as are the CPUs' cache interfaces, but the CPUs are otherwise 32-bit. This means my buffers are only aligned on 4-byte boundaries, not 8. The unaligned transfer support in the mSGDMA controllers does work as advertised, which is an easy fix.
With those problems sorted I can boot my system on a uniprocessor kernel over the network for about 2 minutes, then the transmit DMA deadlocks due to a missed TX completion. With all 4 CPUs active the DMA lockup happens within a second.
The issue is racing between the RX/TX IRQ handlers and the DMA hardware. It is possible for the mSGDMA to complete a descriptor and push it to the response buffer at the same time the CPU drains the response buffer and clears the interrupt flag. Altera's SGDMA has similar problems, the descriptor update can race with the descriptor chain walk and the CPU can wind up not seeing freshly completed packets.
Altera's TSE/SGDMA driver appears to avoid this problem by simply polling for TX completion... which is an utter travesty. The entire point of having scatter-gather DMA is to queue multiple packets and then move on to processing something else, not to sit there busy-looping.
It looks like I am going to have to write my own DMA controller to avoid this bug. I don't want to change the mSGDMA code, it is already far more complicated than it really should be. The DMA hardware needs to have cleaner IRQ behavior to close the race windows without polling hacks. It will just take me a few days to sort out properly.