Hi James,
I would vote for absolute performance over fmax. I'm trying to think of an embedded app that would revisit (get a data cache hit) old data. Whether you're processing audio, video, network packets, telemetry data, or some other data, why would you need to process it twice? (I'm sure there are some examples, I just can't think of any)
In the meantime is there any relief? Anyway to dma into the cache? How about read ahead caching? It's very typical to run through a work packet of data in order.
Can we at least get rid of the cache miss penalty for the LD*IO instructions?
Thanks,
Ken