If you read the reports about resources use of your design, isn't there a subsystem that uses a lot more than it should? In that case a rewrite of the source, as Tricky suggests, can have trumendous results.
I recently had a design that took too many LEs for my taste, and I found a block that used 90% of the registers. That block was a VHDL array that stored some parameters. Reading the code in more details shown that what I described with the array was a triple port RAM instead of a dual port, as I thought I did. Rewriting the code to change it to a dual port made the synthesizer use an embedded RAM block instead of registers and the total number of LE was divided by 10. So sometimes little changes can lead to big improvements!
Look especially for parts of the code that could use embedded blocks such as memories and multipliers, and check that the synthesizer indeed uses the embedded blocks instead of LEs. Some HDL code that look simple can also use a tremendous amount of registers in some cases, and rewriting it or using multiple clock cycles and share some resources between several blocks or functions can also give good results.