Do you want a pipelined divider (result every clock cycle after an initial latency) or a serial divider (low throughput)?
Pipelined dividers are big due to the nature of division whereas serial dividers are fairly tiny (approximately 2.5 LEs per bit roughly). A serial divider performs division by using long division which is why it's so small (because the operation is just shift, compare, and subtract). You can also make hybrid serial + parallel dividers where multiple bits are operated on each clock cycle.
If you can't find anything leave me a message and I'll see if I can find one of mine.