Lance,
Thanks for the reply. As it turns out, the issue was two unrelated "problems". The first is a real bug, and we haven't found the source yet. The driver and the DMA system seem to occasionally get out of sync on power up. This is usually resolved by power cycling the system, but for some reason the software guy didn't think of that. A serious problem, but one that happens infrequently and is resolved easily.
The other issue is that I forgot how the bus interface code worked. My board is driving a bridge to the host PC over a synchronous LVDS interface. Because my board is the clock source, I align the data on the falling edges of the clock. Due to pipelining of the block memories, I have to go through a few delay registers to line everything up properly, which is what was confusing me. Once I started qualifying on the clock correctly, everything was shown to be working at the output of the board. I just hadn't documented it well enough to remember a year and half later...