It seems like it's typically only about 3 clock cycles to do an IOWR or IORD. So for your 22 PIOs that's 66 clock cycles. Is that too slow? My calculation indicates that will take 0.7uS at your current clock speed. That is cutting it close. What are you going to do with the data after you get it? Are you going to have enough time to do whatever processing it is your going to do?
Maybe I'll write a little HDL module that will read them all for you and store them into memory.
If you could do one read per clock cycle, that would get your total sample time down to 0.260uS.
In reality, you could do two reads per clock cycle in HDL and get it down to 0.130uS.
Jake