Striding across PCIe doesn't sound like a good idea to me. For starters when stride is enabled you have to disable bursting which will kill your PCIe performance. Also related to that is that for each word transfer a new PCIe packet will need to be formed because stride will cause addresses to *not* be continous. Perhaps your impression of the feature is different than what is implemented. For example if you setup a stride of 16 for a DMA that is 64-bits wide these are the addresses you should see accessed assuming a start location of 0x0:
0x000-0x008
0x080-0x088
0x100-0x108
etc....