That's a common mistake, anyone that has worked with DMAs would be lying if they claimed they have never have done that :)
If you create a linker region you can have the CPU avoid putting code or data in that region if you want to share that memory with the external master. I forget how to do that in the linker script but the Nios II BSP editor should make that easier since it's graphical.
* edit * Missed that part about the template. I wasn't aware the templates didn't have byte enables (oops). For now it would be just a matter of adding one one more signal to the template 'byteenable' and mapping it to the 'byteenable' Avalon-MM type. Then in the HDL you would just assign all ones to it. The width of the signal would be "DATA_WIDTH/8" so to assign a varible set of all ones you could do this:
assign byteenable = {(DATA_WIDTH/8){1'b1}}; // takes '1' and replicates it (DATA_WIDTH/8) times