You should notice that a read is always 32 bit wide.
Regardless of your target.
If for example you have an 8bit slave you will have 4 cycles (4x8=32)
If the slave is 16 bit then ypu will have 2 cycles (2x16=32)
in fact if your software does a char (8bit) read this leads to a 32bit read and the 24bit that are not wanted will be ingnored.
the only difference is a write. a slave with 8 bit will have 1 cycle if the write is 8 bit wide.
one of the avalon documents says that a master must set all byteenables for a read cycle !
This is the reason why it ist currently nearly impossible to connect an existing profibus chip (8bit) to the avalon switch fabric. some registers inside the chip interprete a read even if this access is not intended by the software but done in hardware
have you monitored your custom component with signal tap ?
in the beginning i had readdata as a n-bit wide register. this lead to the problem that the first cycle with chipselect loads this readdata register and the next one is needed to have the output available and so i had to insert 1 waitstate. furthermore check the sopc setting of your custom component.
nowadays the readdata is a wire with a set of combinatorical logic and now each access is only 1 clock cycle long, regardless of read or write.
Michael Schmitt