If you need something now, then you will have to sacifice something.
Binary Multiplier ----> small hardware size, "long" latency (about 35 cycles), need to map hardware (can't use "*" directly).
Parallel Multipier ----> large hardware size, short latency (probably 1-3 depending on how it's implemented), need to map hardware (can't use "*" directly), and as James mentioned can impact Fmax greatly.
Software Multiplier (like Kerry suggested) -----> no extra hardware, long latency, no need to map hardware for multiplication (you use "*" in you're code)
If any of those sound good to you let us know and we can provide more info on how to go about using any of these options.