That's a good point. If you plan on creating a design that has the code run completely on-chip then I would recommend these two configurations to ensure the maximum performance possible:
During Development:
- Turn off the data cache
- Reduce the instruction cache to 512B
- Add a dual port on-chip RAM that will be pre-initialized with your code
- Hook up tightly coupled instruction and data masters to the dual port ram
Development Complete (only applicable if you don't plan on keeping the JTAG debug module):
- Reduce the instruction cache to 0B (this will remove the instruction master)
- Remove the JTAG debug module
- Regenerate the system
- Recompile the software
- Recompile the hardware
The CPU at this point will still have a data master but no instruction master. The only reason why the instruction cache was present during the development cycle is because you need the instruction master to be connected to the JTAG debug module. I won't go into the details about why the instruction master is removed when you have no instruction cache so just take my word for it. Any time the data cache is removed or set to 4 B/line the data master does not include the read data valid signal so if you high latency reads to perform then you might want to keep the data cache turned (16/32 B/line) and let it perform the accesses since it'll pipeline the reads.