The rx_done glitch reported in the original post isn't necessarily a problem. As long as the design unit interfacing with uart_rx is using the same clock (strongly suggested to do so), it doesn't "see" the glitch. It's more a matter of design topology. If you implement the uart state machine in a single synchronous process, all outputs are automatically registered. But if you prefer a two or three process state machine template, I don't see a necessity to register any internal signal, except for those send to external pins or foreign clock domains.
Besides the glitch theme, which isn't an actual problem in my view, I see one different problem at first sight.
You are reading rx into your state machine without synchronizing (registering) it previously to your design clock. This can cause unexpected results if an rx edge is coinciding with the clock edge, e.g. state_reg falling into an illegal and possibly unrecoverable state.´
To filter rx glitches, you may want to re-check the start bit in the middle.