Please note that this is not the case with all device families (i.e. S10 and Agilex are a bit different from previous devices). Have a look at the specific device family user guide for more info.
In terms of timing if you add a clock buffer of course you will have an inherent delay due to the buffer delay itself plus the routing you need to bring your clock signal to the buffer itself. However then all FF fed by this clock shall have the same delay (so almost neglectable skew).
You mention that in the design you are taking from ASIC, you have a lot of clocks signals (with I assume different gating).If this is the case consider to use non dedicated clock routing resource for clocks with low fanout.
In that case you can just AND the clock with an enable signal to gate it - anyway you will not incur in any timing penalties as the local routing is connecting ALMs (of course you will add to the clock a delay due to the signal passing through an ALM that without the clock gating you will not need).
For a design like this I suggest to turn off automatic clock global promotion in the tool and just add the ALTCLKCTRL IP for the clock you want to promote to global/regional/etc.
if you are talking of hundreds for sure the FPGA (whichever you chose from any vendor) will not have enough dedicated clock lines for all of them. I encourage you to look at the specific device family user guide.
For the ones with higher fanout, use dedicated lines hfor that I suggest to use the ALTCLKCTRL IP to have the exact control you want.
The choice of regional or global depending on how much the logic could be spread through the device - i.e. even if the number of fanout is not very big sometimes there are other considerations to be made in case most of that is BLOCK Memory or DSP.
Note that in the chip planner you can always show the clock regions to get a better understanding on that. You can also force the placement of the logic in a specific part of the device creating a logic lock region.
For the remaining you could just change the coding to implement the clock enable synchronously with the data.
As all registers have the enable this will be converted by the tool appropriately to it.
However as pointed out by the documentation this is not reducing the power of the clock line as the clock line is always toggling (the enable is implemented at LAB or FF level depending on the family).
If your aim was to use clock gating to reduce power, for the clocks with higher fanout you shall use the ALT_CLKCTRL IP.
The enable implemented there (in most of the family) tied off the clock network itself, so you will get the most from power saving.
Lastly I want to point out that doing what I was suggesting require that you already have a good understanding of your design. If this is not the case you can just try to synthetize your logic and Quartus by default shall recognize clock and promote them automatically, but sometimes you want to have better control and I assume this was your case.
We do not receive any response from you to the previous reply that I have provided, thus I will put this case to close pending. Please post a response in the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you with your follow-up questions.
p/s: If any answer from community or Intel support are helpful, please feel free to mark as solution and give Kudos.