You havent actually made your process synchronous. You've just put clock in the sensitivty list without creating the correct synchronous template. This will cause the simulation to appear synchronous, while the real design will not be, as sensitivity lists are ignored for synthesis. You need:
process(clk)
begin
if rising_edge(clk) then
-- code goes here
end if;
end process;
As for your testbench, this is the most basic level of stimulation. You could tidy it up to wrap the byte writing into a procedure to make it tidier like this:
stim_proc : process
procedure write_byte(b : std_logic_vector)
begin
for i in b'range loop
rx <= b(i);
wait for 32 us; -- Why is this 32 us? why not some form of clock?
end loop;
-- write stop bit
rx <= '1';
wait for 32 us;
end procedure;
begin
wait until reset = '0';
write_byte(x"01");
write_byte(x"4C");
.......
wait;
end process;
I note your use of explicit time delays - usually you try and synchronise with some clock.