Jesse,
Thank you for this incredibly valuable info. Please Please add this info to the documenation. I've already respun my board with a Stratix over this plus the bit shifting problem (1 clock per bit without hardware multiplier!) I'd hate to see this happen to someone else!
(at least say 1 clock if cached 7+ if not)
Based on this new info I'm not sure if the Stratix will help enough. I don't know exactly what the read overhead on our Coldfire is, but I suspect it is much less than 7 clocks minimum for non-cached reads. Data caching is really of little use for many embedded applications that are always streaming or otherwise processing only new information. (music, video, scanning, almost anything...) In fact what is typically interesting about revisiting the same old data?
I wonder if there is a way to dma into the data cache to get work packets to near 2 clocks? (overhead + one clock for dma + one clock for actual read) Actually, I'm surprised the existing cache controller doesn't assume read ahead and do this already.
I hope Altera will see how crucial fast memory access is. The good news is that anything that can be done will improve performance by 10%+ for each clock eliminated!
Thanks,
Ken