Actually, I replaced the dedicated adder with the ling adder in the critical path only, and it improved fmax by 7 MHz. I tried to use the ling adder in others paths, and it decreased fmax. So I agree with you, the dedicated FPGA hardware does a good job in most of the cases.
To constraint my design, I use a .sdc file. It looks like:
create_clock -name {clk} -period 5.000 -waveform { 0.000 2.500 } [get_ports {clk}]
derive_clocks -period "1.0"
Clk is the source clk (200 MHz). Many dividers are used to get the three others clocks (sys_clk, sram_clk, sp_clk). The fmax summary shows:
sys_clk fmax = 54 MHz (100 MHz expected because a divider by 2 is used).
sram_clk fmax = 150 MHz (100 MHz expected because a divider by 2 is used).
sp_clk fmax = 120 MHz (50 MHz expected because a divider by 2 is used).
According to these results, the sys_clk does not meet the timing requirements. I don't expect 100 MHz but just to get a little bit closer. So I'm thinking maybe I don't constraint the system the right way, maybe Quartus is trying to optimize both frequencies, and not sys_clk only. What do you think ?
Another question: I'm using Design Space Explorer to find the best settings in terms of performance. The report shows:
BASE SETTINGS : sys_clk worst case slack: -9.640 ns.
sys_clk fmax: 61.2 MHz.
POINT 16 SETTINGS (BEST POINT): sys_clk worst case slack: -9.323 ns.
sys_clk fmax: 56.45 MHz.
The slack is better, but surprising enough fmax is worst. Why ?
Thanks!
Julien