Forum Discussion
Altera_Forum
Honored Contributor
10 years agothe actual multiply only takes a single clock (if you look at the handbook, the multiplier is actually an asynchronous component surrounded by registers in the DSP element).
The pipelines are needed to make the routing shorter. The actual mutliply may be able to run at 400Mhz, but when you then try and take the result to some other logic somewhere in the chip, this is what can kill the FMax. Using more pipelining can make it easier for the fitter to shorten these paths between logic. Logic and registers are spread all over the chip, but the DSPs are only in specific parts of the chip in columns (to allow you to pass data from one DSP to another DSP with minimal routing). So it's usually getting data in/out of them that takes the time. Extra pipeline registers (ie registers with no logic between them) allows the fitter to place the registers right next to the DSP. Without them, it might be fighting between the DSP and some other complex logic elsewhere in the design.