--- Quote Start ---
Dear kaz
It's ok. But what about working with 16 bits words ?
If you add up 2^20 samples of 4 you will get 4*2^20 that needs more than 21 bits.. or not ?
--- Quote End ---
4*2^20 requires 3 bits + 20 bits = 23 bits.
in your case you have to multiply say 8 bits * 8bits => 16 bits +16bits => 17 bits + 20bits => 37 bits
You can imagine that by cascading pairs of additions:
sample1(17bits) + sample2(17 bits) needs 18 bits (res1)
sample3(17bits) + sample4(17 bits) needs 18 bits (res2)
res1(18bits) + res2(18 bits) needs 19 bits
...
thus you imagine 20 cascaded stages of addition needed for 2^20 samples