Again, let me answer your question one by one.
1. This is really hard to do. We first observed this problem in our system, with multiple boards in the chain in different cabinet. It is duplex link but we only use it for one direction, FPGA1 to FPGA2. The end product of this is the RF signal came out wrong then we realized there is a problem then we put in signaltap at each stage and suspect it is FPGA1. I could put a loop back at FPGA1 output but I can only test it by putting signal tap on FPGA1 Rx. Since we don't use this direction, that is quite a bit change and again every time we make changes to the design or even no change just recompile the problem goes away. Second, if we put FPGA1 loopback, FPGA2 won't detect a link the system software will not run and create packets. We have to change the software to make the test. That is quite some exercise.
2. Same reason as 1. And I really doubt it is the phy, transceivers or high speed serial link itself, since our system software monitors those error siganls like err_rr_8berrdet, err_rr_disp, err_rr_pol_rev_required, overflow, underflow etc signals all the time and has counter on those to count how many times they happens. the error counters stay error. i am convinced that it is not link layer but somewhere above.
3. Again, i doubt it is link layer and below problem. One more piece of information, at the beginning, we used streaming mode for seriallite and never saw this problem. about one year ago, we change this link to packet mode, this problem started to show up. that further indicates it is something in the seriallite core.
Thank you