Yes, I meant just staggering the switching time. Of yourse with PWM, the duty cycle is arbitrary, but an equipartition of start time would paraobaly reduce simultaneous swiching.
I have used 2-layer PCB for prototypes and some simple FPGA production designs. Basically I would rely more on maximum usage of bypass capacitors and an almost continuous ground plane at the bottom than on thick supply traces, but the shown supply wiring is acceptable, I think. I'm not sure, how the design will look including signal wiring, but I would give it a try.
Regarding differential clock supply, I was aware that Cyclone III doesn't support a differential IO standard with 3.3V VCCIO. With Cyclone II, e. g. LVPECL could be used. So I think, the option isn't usable in your design.