The embedded multipliers will achieve highest performs when used with it's associated registers. But they can be also used in an unregistered "combinational" mode, e.g when combining multiply and addition operation. The embedded multiplier block diagram in Cyclone hardware manual will clarify about the avaliable options.
When inferring multipliers from HDL code, which is probably the most popular method to use it, you can control registering by performing signal assignments under a clock edge sensitive condition.
When using multipliers at moderate clock speeds, e.g. 30 to 50 MHz, it's not generally necessary to enable registering for each multiplier output or even additionally for inputs. Timing analysis will tell you what's possible.