Rysc,
In answer to some of the stuff you mentioned of asked two positings ago;
Bottom-up design; You're right. My previous work never included placement and routing of lower level designs into higher levels like this before. This is new to me and is due to some of the design requirements. (A question you ask and I'll answer further below.)
Part of my history with this, and large success, was a design I did some 9 years ago. The FPGA, a 10K50 then, had to interface on the PC104/ISA bus. To attain the fastest speed there it mapped into the available memory addressing area but was not memory. The problem occurred when the Real Time Operating System started up and the booting went to find and test all available memory. When it hit the FPGA addresses the tests failed and the system hung. The quick and easy solution was to wait 10 seconds after the FPGA programmed before responding to any accesses on the bus. But try running a top level simulation that includes a 10 second waiting period at the start. (Yeah there are ways around that and I did use them but never was it a thorough simulation.) Much to my relief was that when my post Place And Route simulations of the lower level functions worked they functioned correctly in the top level device without first simulating it.
Now this was MaxPlus2 using the builtin simulator designing to a 10K50 with the highest speed clock at 10 or 20MHz so I got away with a lot back then but this is part of my history.
Today is very different, Quartus2, full ModelSim for simulation, Stratix2's and pushing the speed envelope for all we can get.
Clocks; Using the technique for grabbing Global resources actually leads to very interesting results. One of the things I've seen is that in the lower-level if you grab the resource you are going to use at the top-level the routing delay from that resource to the function will be the same. Now if you don't include the source of the Global resource the compiler will automatically attach it to a pin. However when you import the lower-level to the top it will drop the pin source and connect it to the real source. So what? Well, if your Global resource IS from a pin AND you include that in the lower-level then the delay of the input pin to the chosen and set Global resource to the logic using it will be the same in both the lower-level AND top-level. Hows that?
Goal;
Well this design, as I said , is going into a Stratix2. The internal routing of the Stratix devices is quite different than all the earlier families and some of the earlier/smaller Cyclones. The only things that are routed fully across the device are the Global Globals. This does mean that if you want a data bus of signals to flow through from left to right and match the timing of a Global clock you've got to do a little pipelining. This is understood.
So goals,
1. To get the routing of the lower-level functions to meet their individual timing constraints , lock that down and then import that locked placement and routing into the top-level. In other words preserve timing.
2. When recompiling the top-level to only change those lower-level partitions that have been changed, and imported, and not change any placement or routing of the unchanged lower-level partitions. Is this saying preserve timing again? It will also reduce compile times (but I don't really care about that).
3. With the earlier stated issue about pipelining requirements; It is recognized that as the main data stream, clocked at 125MHz, flows across the device from one partition to another it will cross a number of routing pathways. These crossings can result in the data no longer lining up with the Global clock. By locking down the partition's placement and routing and post PAR simulating and viewing the delays it can be more easily determined when and where to place locked down sets of pipeline registers to realign the data with the clock. I don't see how to do this relyably otherwise.