If it’s a lower-level issue, it should be visible without traffic as well. Maybe you can implement a detector, triggered by the condition on test_out that acts without PCIe traffic. Maybe connect it with a LED to see the rate of such incidents.
In any case you have to make sure that you correctly respond to any non-posted requests sent to you, otherwise the sender will not regain his credits and run out of them quickly. Then his completion timeout mechanism will start to act, and depending on hardware and software configuration, this leads to soft or hard errors, lockups and maybe an immediate reboot.
I am not familiar with the soft IP and have never used test_out, so bear with me.