--- Quote Start ---
I have a divided clock clk_div. I constrain it correctly with TimeQuest. I provide clk_div to the fast global clock network.
...
2) My Design is reliable.
Can you suggest good literature for this problem?
--- Quote End ---
Whether you have a generated clock in TimeQuest or a derived-clock setting in the Classic Timing Analyzer assigned to the divided-down clock, the fundamental clock skew issue still exists. I don't know of literature on the subject. I just know you need to be careful when timing analysis is comparing a clock skew to a data path delay.
Global routing is thoroughly characterized, so the timing analysis is very accurate for the small skew within a clock domain on a global. But there are uncertainties in the timing when the clock skew is from something other than global routing. The timing analysis is using all numbers at the slow PVT extreme or all numbers at the fast PVT extreme (depending on whether you choose the slow or fast model for timing analysis), but the numbers probably are not all at the extreme for a given path at your particular process, voltage, and temperature combination. The timing analysis has to compare the clock-path delay to the source register, the clock-path delay to the destination register, and the data-path delay between registers. The clock skew is the difference between clock-path delays. Will the clock skew be a little faster compared to data delay than the extreme numbers say? Will it be a little slower? With today's timing models for FPGAs (not just Altera FPGAs), you don't know. That's beyond the scope of slow-model and fast-model analysis.
Quartus provides clock setup uncertainty and clock hold uncertainty settings for either analyzer. Cross-domain clock skew between a divided-down clock (even if on a global) for one register and another clock for the other register is one of the cases where I recommend adding some uncertainty, but I can't tell you how much. Most people don't bother. Most people don't think about it in the first place. Some people assume guard bands in the timing analysis take care of it, but I don't like that argument because those guard bands are meant to cover other uncertainties--they weren't necessarily intended to cover this one. To be proper you should make an allowance for the skew-vs.-data-delay uncertainty yourself.
It's better to avoid a divided-down clock in the first place unless you make it global and have no synchronous cross-domain paths so that this uncertainty isn't an issue.
--- Quote Start ---
1) My Desing [with a divided clock] will be faster than the clock enable version because I don't have the delay of the (very) high fanout of the clock enable signal, which can slow down the design (can it?).
--- Quote End ---
No matter what the n value for a divide by n, the clock enable paths have to operate in a single clock cycle. As I've said at
http://www.alteraforum.com/forum/showthread.php?p=2255#post2255, nonglobal routing might be better than global even for a high-fan-out clock enable because there is a big delay associated with the global buffer. Try both global and nonglobal to see which gives better slack.
The clock enable might not be as high a fan-out on a single signal as you would expect from the RTL. Synthesis tools tend to include other logic in the clock enable in addition to what is directly implied by the HDL "if" statement for the RTL clock enable. That's why you often see a large number of clock enable signals in the "Control Signals" table in the Fitter compilation report.
If you do have a timing problem from the fan-out on a clock enable, replicate the source of the clock enable. There are multiple ways to do this ranging from letting the tools do a brute-force replication without regard to where the clock enable destinations need to be placed to a manual replication in the RTL that intentionally groups the destinations according to where they will be placed on the device.