I work as a consultant and have done several IIR filters.. I typically use biquad sections to implement the filter and simulate the results to validate stability that quantization of the data path doesn't doesn't cause a divergence between the hardware and full floating point matlab simulations..
I have found that depending on the filter characteristics, the number of coefficient bits becomes critical. I typically use the Direct Form I method for fixed point and scale the result of the summation at each stage based on my coefficients to maintain data path bit width.
A good website to see the different forms of biquads is
www.earlevel.com/main/2003/02/28/biquads/ What I have found that I'm able to do that the generated IP's are unable to do effectively so far, is scale my hardware reuse based on the performance requirements of the system. I always make my coefficients programmable, and usually have a state machine that I can work through either a single MAC, or Biquad stage and have a programmable (within limits) number of stages the data is ran through.. This give's software the ability to change the filter characteristics of the system without re-building the FPGA.
I primarily work with Verilog instead of VHDL, but if you would like some help, send me a message and we'll get in touch.
Regards
Pete