The code that you based your design on allocates memory buffers and SGDMA descriptors in the heap. So not only does the processor need to be connected to the same memory as the SGDMA but you need to make sure your place the heap section in the memory you want the transfers and descriptors to be placed.
So if you hook up the Nios II data master to the on-chip memory, place the heap section into the on-chip memory then the transfer should work fine. You would then place .text, .rodata, .rwdata, stack in the DDR SDRAM.
Alternatively you could hack up the code and placed the data buffers and descriptors anywhere you want and remove the malloc() calls that are doing this today (malloc() allocates memory from the heap).
So I would try the system/software changes first before running the simulation, I think this is just a matter of making sure the CPU and SGDMA have the right visibility to the shared memory.