I think you're getting the idea. Did you build a decode entity? A simple decoder is nothing more than a chunk of combinational logic that takes the opcode as input and asserts control signals. Again, in a real hardware system, one doesn't think in terms of "only execute this function at a particular point in time". A chunk of combinational logic always reacts to changes in its inputs and that's OK. It only matters that the comb logic has the correct inputs when you care about (read: sample) the outputs.
Your state machine is responsible for controlling the flow of data between your different hardware blocks. For example, in the fetch phase, it stores the current instruction's opcode in the opcode register. This register feeds the decoder, which generates a bunch of control signals that you can register on the next clock cycle. At least, this is how it might work if you were building a pipelined processor.