If a (CMU) PLL output is shared amongst the RXs within a quad, that means to me the 4 RX Buffers will sample the incoming serial streams at the same instant but does it make sure the 4 deserializer outputs are aligned ?...
If RXs are bonded, a common refclock feeds:
1) the PFD of the RX CDRs (depending on LTR/LTD mode ? =>TB checked)
2) the RX phase compensation FIFO wrclk
but each RX provides its own serial and parallel recovered clock. I don't see any guarantee for each of the 4 deserializers will be clocked with the same parallel (low-speed) clock phase.
A bit misalignment between incoming serial streams would rather focus my attention on the PCS word aligner.
Are you using an alignment pattern ?