I swapped a defective CPLD in a USB blaster to do some experiments with this.
The part is fairly small, so there isn't exactly a lot you can do, but I did end up writing a USB -> SPI bridge. I swapped the oscillator for a 10MHz part (which I had in stock) so I could do every thing in a single clock cycle.
More recently, I built a board with a FT245 and a Cyclone III (3c16), and did a proper bus-attached UART interface for it. This system has a much faster oscillator, and I have to implement counter delays.
I could show the code, but it's fairly straightforward. I have two idle states, and toggle between them unless their check conditions are true. Note that all I/O from the FT245 is registered, to get the signals into the internal clock domain.
In the first IDLE state, I sample RXFn and PWRENn to see if they are both low. If they are, I initiate a read from the external FIFO and copy it to an internal FIFO.
In the second IDLE state, I check the internal transmit FIFO, and if it is not empty, I perform a write to the external FIFO. Obviously, during the write state I have to enable the output drivers.
There is some code to handle the bus turn-around, but functionally this is all that is happening. The internal bus clock is 72MHz, and I have gotten pretty close to the theoretical max data rate for the FT245 using the UART.
I could probably get even closer with careful use of the Send Immediate pin, but so far, I haven't needed to use it.