--- Quote Start ---
My problem with the performance counter block is that it is a 16bit slave - so it takes a lot of accesses to do anything.
--- Quote End ---
I'm not sure about the history of the block, but at least in the files sitting on my hard drive right now, it is a 32-bit slave.
--- Quote Start ---
With care and a combinatorial custom instruction (or 2) could be 3 instruction (I think).
--- Quote End ---
I'm not following you: I see two loads, one store, and one math function (4 instructions, or 128 total).
I'm not intimately familiar with the instruction set: does NIOS have a double-load or add-and-store instruction? (how did you compress it down to 3?)