With the TCM set for 'single clock' we see occaisional memory read errors.
The code snippet below is calculation the transmit CRC16 (the custom instruction is combinatorial and updates the crc for a new byte).
If 'tx_src' points into the TCM both versions work.
If 'tx_src' points into SDRAM (no data cache) the '-' version generates correct TX data, but a constistently invalid crc.
ldbu r6, 0(r7) # * tx_src, s
- ldhu r11, 42(r8) # <variable>.hdlc_tx_crc, crc
addi r7, r7, 1 # , tx_src, tx_src
+ ldhu r11, 42(r8) # <variable>.hdlc_tx_crc, crc
custom 1, r3, r6, r11 # , <anonymous>, s, crc