I'm trying to create one CPU to be used as 16 components in a systolic array configuration for multiplying two 4x4 matrices. The arrays created in this code are just samples of the first row and first column of the matrices.
The end result has to be a 16-cpu systolic array system that takes in two 4x4 arrays, one from the west a column per clock and one from the north a row per clock, and computes the resulting array.
I understand now what you mean about the delaying in the if loop, I'm reading the references I have to try to learn what needs to be done correctly. No luck so far.