With old memory technologies (I can't remember whether it changed for SDRAM or DDR) processors used to read cache lines using an 8 word burst that started with the word they wanted - rather than the first word of the cache line (doing CAS only memory cycles while holding RAS asserted).
This gives lower memory latency.
Modern memories directly support sequential burst access - but these won't wrap - so the burst must finish at the end of the cache line.
I suspect bursts can start mid burst/cache line, so the cpu could still read the requested word first by doing two memory bursts.
OTOH I went out of my way to use only tightly coupled memory and disabled both the instruction and data caches!