Could you add a little more info to the snippet of code? Are m, n and h signals/variables, i.e. they could be any value, or are they part of a for loop? It sounds like they are actual nets in your design. And is this in a clocked process, or a combinatorial line that feeds into a register?
I don't get why there are muxes to look up the x and muxes to feed it back? Even if m, n and h can be "any value", they are the same on the both sides of the equation, i.e. x[3][1] can only get x[3][1] + h, not some other x value. I guess it depends on the synthesis. You could have a 9-bit 36:1 mux that feeds a single adder who's value is decoded to back to one of 36 9-bit registers, or you could have 36 adders and do no muxing. Or something in betwee, say 4 adders, one for each n index. Etc.
What frequnecy are you targeting? How much slack do you have? What device and speed grade? Care to add a .txt of the critical path? Interconnect timing is generally the long part of the delay, but I would really look at how far the hops are, i.e. the placement. You're not going to have fantastic placement because of the nature of the algorithm, i.e. a large number of nodes that cover a decent sized area, muxing down to a single area, then muxing back out.