I don't quite get it. The code has latency of 46 cycles. So, even if the system is pipelined, it will still be restricted by this module to 1 output every 46 clks. Am I right?
Anyway, it is a short interval. Using software to poll or delay does not seem appropriate. The software loop will need to be highly optimized. Any interrupt or deferred bus access would easily cause data loss. My opinion, software timing can be used only if we don't need accurate clock count and the interval is at least thousands of cycles apart.
In your case, if latency is not an issue, it can be fed to a FIFO and read out much later. Otherwise, feed it to output without software intervention.