Have you simulated this?
I can't see the specific fault you're after, although I can deduce from your code and screenshot that is appears to be skipping the data you're trying to send from states H1S2, H1S4 etc. So instead, a few general observations.
You have a lot of repetition in your code. There are a lot of states that contain the same code, or subtle variations of the same code. This generally leads to errors. You can update everything except 'tacho_out_reg' and 'state' outside of the state machine. This covers a lot of repetition and will reduce your code a lot.
I don't see the point of the 'wait_HxSx' states. You're only in them for a single clock cycle (20ns I deduce from the numbers in your code and post). That's nothing in the scheme of a 9600 baud rate. I suggest you remove 'unnecessary' states. This too only adds room for error (although I don't think these states are a problem here).
I'll also draw your attention to the way you're updating 'clk_reg' - which may contain the error you're after. You increment it every cycle. However, if equals 5208 you
also set it to zero. All the rest of the time, when it doesn't equal 5208 but when 'bit_position' equals 31, you
also set it to zero (as well as incrementing it). So, you have cases where it's not clear from your code what value 'clk_reg' should take next. What the tools (Quartus) do will be consistent every time you run it. However, it may be that different tools do different things. Either way, I recommend you code it more explicitly. I bet you get a different result if you change the order of these statements in your code.
A more explicit way of coding what, I think, you've intended to code is:
if (bit_position == 10'd31)
begin
clk_reg <= 16'd0;
bit_position <= 10'd0;
state <= wait_H1S2;
end
else if (clk_reg == 16'd5208 )
begin
bit_position <= bit_position + 1'b1 ;
clk_reg <= 16'd0;
end
else
clk_reg <= clk_reg + 1'b1;
Cheers,
Alex