1) a process is always sequential as a code body but it results in either logic(seq or comb). However this is a side issue and don't worry about it. All I want to say now is your process should be clked.
state machine: if you only need two states then just toggle a signal and avoid the headache of state machine...
2) your loops now are ok, I just pointed to the fact that the index to be 0 to 127 or 1 to 128(whichever suits your indexing).
3) yes you need to think of a plan to use memory, unless you ease up the matrix size. why not use 8 bits instead of 32 bits
4) by functional I mean: what is that wany to do in terms of algorithms.
The crucial point in your plan must be the speed of processing versus incoming data in order to allow for sharing resource, otherwise you will end up in serious resource problems.