I didn't say last loop is valid but last drive on any node i.e. last assignment to same target within one loop. since your targets now are different in each assignment(per state) no problem.
Is there any chance to avoid constructing frames inside fpga. I mean can you index your input samples as they come in or do you have to save them.
You can also think of two parallel storage(too much but may help) i.e. store an incoming frame while processing the one already received then swap over.
This may give more time.
if you need 192 clks to finish off one frame then how fast is this clk. It is the actual frequency(or available time) that matters now in order to estimate what fmax can you lift up to.