When performing byte accesses you should still align the address to 4 byte boundaries for a 32-bit master. For example if you wanted to do byte writes sequentially you would use these combinations:
Address Byte enables
0 0001
0 0010
0 0100
0 1000
4 0001
4 0010
etc....
So above address 0, 1, 2, 3, 4, 5, etc... are written to one byte at a time. Typically I mask the LSBs of the address in my own masters based on the width of the master. So for a 32-bit master I would mask the address with & 32'hFFFFFFFC. Then I decode and shift the byte enables based on the width and alignment of my access (hard to explain in text..... see the write master in the modular SGDMA up on the alterawiki for details).