--- Quote Start ---
A hello word simple program with a printf takes 4k of onchip mem... that's impossible right?
--- Quote End ---
That could be quite normal: a complete printf implementation requires a lot of code. You must also account for the driver code which redirects the printf output string to the stdout device (i.e. jtag uart).
Did you try the compiler directive -Os? This should optimise code size.
Anyway, if your 4k code doesn't fit in a 10k memory, this is probably due to stack/heap or any data memory required by the stdio library and driver.