--- Quote Start ---
Your second process is heavily combinatorial and is unlikely to pass timing. I personally use one process state machine always. That way I declare one state signal of type say s0,s1,s2,s3,s4, ...etc. and assign next state as follows: if...state <= s0, if... state <= s1 and so on. The change of state is always on the clock edge so are any conditions and assignments. To keep functionality you will need to redesign for latency effect.
--- Quote End ---
Using Timing Analysis I found that the critical path is inside the clock process where i store the result in the output register.
if w_en = '1' then
for i in joint'left-8 downto DATA_WIDTH loop
if i = index then
output_reg <= joint(i + 8 downto i + 8 - DATA_WIDTH + 1) after 1 ns;
end if;
end loop;
end if;
This design requires to have 2 processes at least to work properly and comply with the ITU-T G.709, G798 OTN specifications.