Max+Plus II used to be able to do that ~10 years ago, but it didn't really work. It almost always split in a way that required way too many pins(devices back then didn't have a lot, and most blocks have quite a few nets going between hierarchies, so it's still a problem now). You would also need a horribly slow clock speed since you would be sending data across chips(and with combinatorial logic in the path) at the clock rate to keep them within a cycle.
Synplicity Certify(I think that's the one) has some very cool features for doing this, although it's still usually somewhat manual. It's far from free though, and I think is targeted for ASIC prototyping.