set_output_delay is used to describe external delay so your output can meet setup/hold timing at a destination. With logic after a register, set_output_delay doesn't include the delay through that logic. The Fitter will try to meet the external timing, but it might not be able to. This is why output registers are recommended. They give a fixed point of reference to the output pin without worrying about extra logic delays (only routing delays) to the pin.
What you could do, which is messy, is use set_output_delay for the external delay and then you could use set_[max|min]_delay for the delay through the logic past the register. But adding an extra register stage is always a plus, unless you are saying that your design can't tolerate the extra cycle of latency.