I'd expect the stack to start at the highest address of the data memory area, so the amount of memory available for the stack is defined by the size of the rest of the system!
If you are squeezing code into a small space, you probably need to be careful not to include much (if any) of libc - so check the namelist of your elf image in case anything has got added. You probably want to stop malloc() and friends being included, which may constrain which altera libs you specify. I'm not sure that their small libs don't still call malloc() during initialisation.
If you've managed to exclude malloc from your build (I don't know if altera's low level jtag uart code needs malloc - but you can write functions for tracing strings and integers that don't) then the only initialisation you need is setting %sp and %gp (and zero .bss - which can be done from C).
If you only have on-chip memory, then it is probably worth using tightly coupled I & D memory (you'll need a minimal I-cache to use the jtag or epcs loader). When I started doing that I had to use a non-standard linker script to get the program headers to load the data to the required target addresses - rather than adding code to copy initialised data from the end of the code to the target address.
You can also reduce the code size by making all data memory and IO be within 64k and marking everything as .sdata or .sbss so that the compiler generates %gp relative addesses. You might need to rebuild the compiler (with my patches from the wiki) in order to get %gp relative addressing for structure members.