If you are tight on memory it is worth making sure your code isn't using any libarary functions that you don't explicitly need.
IIRC even the 'small' BSP has a lot of extra stuff, the normal one is ridiculous.
Altera ought to supply a 'minimal' example that contains absolutely nothing that isn't strictly necessary, and is designed for separate tightly coupled instruction and data memories (no caches).
(It is a shame you can't connect the boot code (JTAG or EPCS) as tightly coupled instruction memory).
Make sure you are compiling everything with -O2 (or -O3), not the default unoptimised - which will generate a lot more code.
As well as the .map file, nios2-elf-objdump has options to show the symbol tables and disasembly of any of the object files.