You'll probably need latch the data from the USB parrallel bits and request Avalon master bus cycles based on the latched values.
In some senses a UART interface might make this easier - split each uart data byte into two 4-bit fields, use one as 'command', the other as 'data'.
So a memory write would require upto 8 bytes to set the address, 8 bytes for the data and a final 'write' command (which might increment the address, and might set the last data nibble).
Read would be much the same except that you'd send 8 data bytes to the host.
That would be moderately slow - but probably not impossibly slow!
The shared memory access isn't an issue. All the internal memory is dual-ported and the Avalon 'bus' does arbitration.