That's the part that keeps concerning me, is that I'm sort of, kind of, halfway treating this like a source synchronous interface even though it's not quite really one.
I feel like this is a problem that must have a simple, obvious solution, since it must come up all the time. Any time you want to run, for instance, a fast SPI link with the clock coming from a state machine, you're going to have a situation in which you need to close timing around an external device that outputs data in response to a clock coming off a registered FPGA output.