Some general observations:
1) I don't recommend using chip select to qualify reads and writes (just use 'read' and 'write' for that)
2) That coding style probably doesn't scale to multiple registers very well. What I typically do is put the address decoding and byte enable and write qualification into one spot and when I need more register just replicate that one-liner.
3) For reads I wouldn't bother qualifying the byte lanes with enables, just register the entire word (master will have to filter out the unused byte lanes anyway...)
For masters whenever possible try to decouple the control and data paths. For example if your master is capable of issuing multiple reads try to minimize the amount of control logic that captures the read data. In my various DMA master implementations I typically do this by using a FIFO and the only interaction between it and the control logic is that the FIFO full, empty, and used signals are used to throttle the control logic. You mentioned you were looking to do video, search for "Modular SGDMA Video" on
alterawiki.com and you should find a dirt simple implemenation of a video pipeline that is capable of re-displaying frames when there are no more frames available to display (in video you have to keep displaying otherwise your display will loose sync).