--- Quote Start ---
So I'm trying to create a (synchronous) 8-bit loadable counter, as below but for some reason this synthesizes to around 28 LE's compared to the altera LPM 11 LE version.
--- Quote End ---
Although you didn't reveal the used FPGA family (4 or 6-input LUT?), there's obviously something wrong with the comparison. The Full featured 8-bit LPM counter with up-down and asynchronous parallel load enabled never synthesizes in only 11 LE. I guess you have some functions unconnected.
I wonder however about the purpose of an asynchronous load which can't be effectively implemented in any recent Altera FPGA family. May be you compared incomparable designs?
..............
Just realized that it's actually a synchronous load implemented by a mux. It does even implement in 9 LE on Cyclone III. The shown design is at least different by exposing the q_next outputs.
..............
After removing q_next from the interface, register8 is implemented in 9 LE as well.