Count the instructions in the loop:
1) Read of LED_TOGGLE for the io_write
2) Write of LED_TOGGLE to the hardware
3) Read of LED_TOGGLE for the invert
4) the invert inself
5) Write back of LED_TOGGLE
6) Unconditional branch
IIRC a nios/e takes 5 clocks per instruction - so that is at least 30 clocks.
The constant LED_PIO_DATA_BASE might also be created each loop iteration.
Add in any extra clocks for the avalon write caused by your slave itself.
Make LED_TOGGLE a local variable (so it can be assigned to a register).
Compile with -O2 so that the compiler moves some calculations outside the loop.
That should get you to a three instruction loop.