Hi,
Each of the DSP block is able to fit 3 independent 9x9 multiplier. The DSP merging will automatically kicks in when there are not enough DSPs in the region of interest. The output latency setting is not supposed to impact this. From my side, no matter it is 1 or 3 clock cycles, from the Fitter -> DSP Block Usage Summary Report, it show independent 9x9 multiplier. I believe this could be something else caused the fitter error since there are many DSP configuration based on the attached file. You may starting to debug it by simplify the design with only independent 9x9 multiplier.
Regards -SK