dont know why you keep 1M memory (I assume for the loader, since 1M is too small to contain the bin file). I would like to run the loader in flash and leave all the memory to kernel.
The fact that the clone system call float away suggests that your exception handler is not in the right place. I just learned that the current kernel port assumes the exception offset (from the starting of the sdram) is 0x20 (the default value in the core). In your case, since you moved the kernel 1M forward, you need to change that value to 0x100020 in the core.
Hope this helps,