no that's not the case, i receive 4 bits in each clock where i should construct 3 vectors of length 256 so this will take 192 clocks 192=256*(3/4) where 256 length of vector 4 number of input in each clock and 3 is the number of vectors that's why 192 and sorry for any confusion
in the point of using 32 bits i need a little help here some intialization should be done in the alpha and beta matrices by zeros while in the last state by negative infinity so in order not to saturate our calculation by infinity let's say infinity=10^5(large number) and any number which is out of this range should be normalized to get it in this range so how i can use 8 bits while still representing 10^5 in 8 bits without overflow