I only have a couple minutes now, but have had similar problems.
For small sections of High speed logic associted with a fixed resource, like a pin, I have had success from a very simple technique of opening the timing closure floor planner and drag-N-Drop the registers of that logic into the LAB next to the pins. I get an identical fit every run, same placement and same timing slack numbers. It isn't a proof, but performance is retained. That's what I needed. There are less GUI ways, but I like the shortest path.
For a large heirarchical design where the lower levels individually fit and ran at speed, but when everything gets thrown into the pot the route gets horked: That happened to me on a DSP related project, and I learned the significance of registered boundaries to functional blocks. It is a style, but it saved my ... skin. And, there is rarely a reason to not have that emphasis.
I know I'm not expanding any documentation, but if either of my it worked for me ideas help, it was worth starting the weekend at 5:05 instead of 4:55.