Oh sorry, I didn't notice the part about the D$.
So once the word that caused the miss to occur is loaded into the cache the processor will proceed. That's the purpose of a critical first cache, so that the line doesn't need to fill before the instructions can be used.
The data cache does not have the critical first feature so no matter where on the line the miss occurs, your code will stall under the data cache line fills. This is why there are options for the data cache line size (1, 2, 8 words) because depending on your access pattern an eight word line might not be ideal.
The best case times would be as follows (assuming no system stalls):
With the I$ the wait time is = memory latency + 1
With the D$ the wait time is = memory latency + 9
If you simulate a design you should be able to capture these numbers graphically.