Spent some time re-writing modules and managed to fit my entire design with 2% to spare :-) The VGA timing chain itself benefited greatly, saving 20 LE's just by switching over to lpm_counters up counters. The cout is great as it now serves as a load as well as an enable for the vertical timing. Note that you pay 1 LE for the additional cout and sload signal but despite that, it's all good!
Having just scored a bunch of EPM7064's on eBay, I'll probably break out the address/io decoding and some other stuff, but still... lpm functions are your friend if you're tight on space!
Thanks Dave!
-Mux