OK, this is pretty tricky. Is there a clock input to U4 or is just combinational logic? If it's just combinational, you should add a clock to it to guarantee your timing requirements. Or you should be feeding that enable signal into the clock input, not the enable input of U4.
You don't need a generated clock constraint (unless you do change the enable to feed the clock input of U4), but like you said, you do need multicycle between U3/U4 and U6 if you are saying that it will always take up to 10 clock cycles for data to get to U6.
For U3 to U6, you need something like this:
set_multicycle_path -from {get_pins <output of U3>} -to {get_pins <input of U6>} -setup 10
set_multicycle_path -from {get_pins <output of U3>} -to {get_pins <input of U6>} -hold 9
This is "opening the window," extending the setup analysis to 10 cycles and adjusting the hold analysis to compensate.