Hi Guys,
Thank you very much indeed for your replies.
1. I did use "Fast Output Registers".
2. The setup slacks are all positive.
3. The longest delays are caused by IOOBUF delays.
4. I think that the timing difference of data matters. The big the timing difference is, the slower the data rate will be.
5. I did use one VREF pins for the data. After I moved the data bit to a non-VREF pin, there is a significant improvement. Now the max time difference among the data bits is less than 0.3ns.
I really appreciate your suggestions.
Many thanks and best regards,
Bcao