So I had a project where we actually had a need to do this. Nowdays the IO actually has dynamically programmable delay elements.
However, you essentially need to connect a bunch of ALUTs in cascade. Then you need a mux or muxes (more ALUTs) to dynamically select which "tap" you want. You can also allow dynamically bypass ALUTs in the cascade. In order for this to work you have to control the placement of each ALUT so as to keep your design reproducable.
Then the final step ... characterize. You'll have to try the various permutations of your selections to see what delays you get.
In our case exact delay was not required as we merely need statistical coverage of delays.