Supporting misaligned transfers isn't as easy as it might seem.
At first sight it might be thought adequate to perform 2 bus cycles with appropriate byte enabled and shifted data. This in itself is a moderate amount of logic - especially if you don't want to slow down the processor.
The real issues arise when you get a memory error on the second cycle (eg a mmu 'page not present', or a TLB miss). This really requires a mid-instruction fault! For atomic 'compare and swap' type instructions it is all horrid and really not worth thinking about.