In the said thread (and many others as well) different methods to implement logic cell delay have been discussed, also using Verilog. Their limitation has been already mentioned.
Generally you can trust in the compiler tools' capability to arrange the existing routing delays in a way to achieve the intended timing, assuming your design uses a reasonable clocking scheme or you defined respective timing constraints.
In a typical synchronous design, the data path delay limits the number of logic or arithmetic operations that can be chained between two registers operated by the same clock, so you would be interested to reduce the delay rather than adding more.