These don't address what you are looking for but they are worth mentioning anyway just in case...
1) If you don't want an instruction cache you can instantiate the 'e' core. Note: like others have said the performance will drop as a result.
2) If you place code that you don't want cached in a tightly coupled instruction memory then the instruction cache will not be accessed when the program counter hits that memory region. Note: tightly coupled memories have single cycle access times so if you were looking for higher latency this is not something feasible.
If you post why you need to perform instruction fetches that bypass the instruction cache maybe you'll get an answer closer to what you are looking for. Maybe the Cortex-M1 will accomplish what you are looking for:
http://www.altera.com/devices/processor/arm/cortex-m1/m-arm-cortex-m1.html