--- Quote Start ---
I spent some time on checking the data path. I then realized that this kind of feedback(loopback) is actually launch on one clock and latch on the next clock. Basically, the change of the output is launched on the current clock, but the feedback is latch on the next clock. This is a typical multicycle case. So if I set the the path to be multicycle, and set both setup and hold to be 1. It eliminates the violation.
I think the default setting of the filter is to not to prevent register re-timing, which means register re-timing it is enabled.
--- Quote End ---
I somewhat disagree.
Any data path is launch on one edge and latch on next. I think you mean in your case it is one register for source and destination, true but that still requires setup of 1, hold of 0.
if you make it setup of 1 and hold of 1 you are giving hold extra clock by mistake and so passes timing.
I believe you better insert delay in the inverter path of clk_out as it is launched too early for latch having a short path. Seems the fitter didn't manage it.