Rick,
I am no expert when it comes to floating point but I think there is a tool at your disposal that can help answer this: the .objdump. I'm assuming you know how to create one for Nios II -- it will show you what happens in the GNU FP libs... is it possible to get the equivalent for the arm processor and take a look?
Don't forget that several users have posted various bits of FP hardware to this forum that should integrate nicely with Nios II; perhaps another part of your analysis could include wiring one of these up, comparing performance again, and then factor in the cost (from additional LE usage) to make things fair?