In my opinion, an iteration scheme can have a purpose inside a process also in HDL code for synthesis, but it doesn't achieve what you apparently expect from it. Daixiwen already mentioned it: All 6 loop cyclces are "executed" simultaneously. An iteration scheme doesn't create a sequential execution flow in time, as e.g. a while loop in C program does. It's intended to describe parallel logic.
To perform serial output, you have to define a shift register and send out one bit per clock cycle. It's the most simple solution, nothing can be minimized. As a principle example:
signal sr: std_logic_vector(5 downto 0);
process (clk)
begin
if rising_edge(clk) then
if load = '1' then
sr <= par_data;
else
sr <= sr(5 downto 0) & '0';
end if;
end if;
end process;
ser_out <= sr(6);