I'll assume the hardware works (I'm one of the few North Americans that use VHDL lol).
But looking at your software, if you are using data cache, then you are not bypassing it. For register access declare your pointers volatile (so they don't get synthesized away), and use the IORD and IOWR commands to bypass the data cache. Here is the syntax:
IORD(base, offset);
IOWR(base,offset,data);
Base is the base address of your hardware, offset is the address within the address space of your hardware (offset from the base), and data is the data you will be writing. These are the 32 bit calls, they have others for 8 and 16 bit operations as well.
Cheers.