First I recommend you got an modify your origional code into a proper multiply/accumulate pipeline before trying to modify it, otherwise it will never work on an FPGA at any useful speed. Then we can think about modifying the weights based on the result.
The problem is, what you want to do would mean having a variable length pipeline as it going to require repeated iterations until the weights are correct, and with a properly built pipeline it will take more clocks to complete depending on the number of iterations. You will need a valid output to tell the next block when you have completed the weight adjustement.