I think you have some misperception of a pipline. It's unavaidable to have a pipline, especially with FPGA technology. Without the pipline (wich is not so deep as you might think, 5 is about standard, the nios fast processor version has one more).
The deeper the pipline, the bigger the branch penalty...
The microsblaze has only 3 I think (not sure). Mico32 from lattice also 5 if I well remember.
Hower, you have the option for NIOSII to work without a pipline (the small versio). Then the speed is rather slow (~6 cycles per instruction).
All depends on what you really need, and what you define as an "free, ideally open-source soft-core". Like in my case, I started already many times to develop a 32 bit processor, but anytime I stop because I don't have the knowledge (and time) to do a port of gcc. The toolchain is most of the times the bottleneck, without a decent compiler you can't do anything.
Stefaan