You will need to think of input rate clk as half-byte clk.
Internally you serialise your frame to one bit. Your frame is eventually 768 bits long per every 256 bits(per every 64 input clks). So internal
bit clk must be 12 times faster than input clk rate. Otherwise the very act of serilisation itself will not work(add to that any processing clks needed). i.e. you must allow for the serilisation factor and any extras.
I can't help on issue of 32 bits but if your problem is to compare a given data value with 10^5 then store 10^5 somewhere and compare other values and put result on 8 bits.