Having said that, I normally don't use delay to get control of input/output data timing issues.
Instead I use Tsu/Th for input data, Tco for output data. Then let quartus do the work of delays. This approach is less confusing. The only drawback is when the delay graininess doesn't achieve my figures. In this case I rotate the clk internally using the lovely PLL.
For example:
If a device sends data and clk with Tco of 2ns. board delay of 1 ns estimated for clk and 1.5ns for data(I am not recommending this) then the data relation to clk becomes 2.5ns at pins. From this I will optimise timing window by entering Tsu/TH for a given frequency and quartus will insert delays.
As you can see any board delay can be passed as such to pins.
This method also covers cases when clk is opposite data and whether data is input as above or output(then you enter Tco instead taking into account any board delays and external device Tsu and TH).
So in short I configure my fpga for Tsu,TH,Tco as required to deal with external devices having fixed timing window and various board delays.