--- Quote Start ---
I don't see the design is completely pipelined. all statements involving three additions can be further pipelined e.g.
--- Quote End ---
Yes, that's true. I assumed however, that two 12-Bit additions in a cycle won't be an issue for 48 MHz.
Average of 9 needs a divider or a approximation by integer multiply/shift. But with Cyclone, the integer multiply is converted to multiple additions and may cause timing problems as well. Alternatively, the divider can be pipelined, using a MegFunction. Cause dividers are resource consuming, I generally use a serial divider, where ever applicable. It e. g. takes 4 cycles for a /9 division.