--- Quote Start ---
since your input is complex then square Re(Re*Re) + square Im(Im*Im) then accumulate this result over say 2^20 samples.
The accumulator will need 20 bits extra(over that of adder result) to avoid overflow. For 1/n Discard 20 LSBs when you read final result before clearing it to restart.
for square root, avoid it if you don't need it else use LUT or ip
--- Quote End ---
Dear Kaz,
Please correct me if I'm wrong.. I think that in the worst case, at each addition I should need an extra bit. In that case if I add 2^20 samples I will need 2^20 extra bits... Or not ??
Probably I've only to try and find the correct value of bit needed to take 2^20 samples without overflow..