FPGAs have dedicated clock routing resources, which are great for distributing the clock in a low-skew manor, but their actual delay is pretty slow. So when your gated clock is created by getting off the first clock tree, going through logic, then back to a global buffer and through the clock tree, the imbalance can be quite large. Plus there is no way to easily add symmetrical delays to the first clock tree. All of these are quite easy in an ASIC.
For FPGAs, it is generally recommend to create a clock enable instead of a gated clock to get the same functionality. That won't be a power savings though, since you're disabling the clock at all of its destinations rather than at its source. Technically you could write the code to have two implementations, one that gates the source for the ASIC and the destination's clock enable for the FPGA. In most cases that's probably not practical.
One thing that might be beneficial is the altclkctrl block. It has an enable signal. You can instantiate the clock to go through multiple altclkctrl blocks and they will all be nicely aligned. (I'm not sure what device you're targeting, so look for it's architecture in the handbook). You can't do a ton of clocks this way, but if there are a few major ones, it might work.