If you look at DSP builder you will see many examples of time folding.
A faster clock allows for resource sharing at the expense of spaghetti logic.
for example a 32 taps filter requires 32 mult if run at sampling rate. If you run system clock at say 4 times the sampling rate then you can use 32/4 = 8 multipliers only. every clock uses 8 mult in turn and the result is then accumulated. Saves resource but makes life difficult...
In reverse to that you may split up the data path into say two channels odd/even running each runing at half speed and process in parallel until some point. This doubles some of the resource but may help avoiding fast clock to achieve timing.
The notion is all about speed/resource trade off.