In SOPC builder you should put all your memory and peripherals as close to 0 as possible for efficient hardware.
For the purposes of bypassing dcache, your best plan is probably to use IORD/IOWR for all register accesses.
For accesses to memory shared with other masters (other processors, DMA controllers etc) you can either flush the cache after you write to it (or before you read from it) or set bit 31 using the HAL function call.
If you take the second route then you can choose to access some locations within a memory with bit31 set and other parts of the same memory with bit31 clear - this will work correctly and there will be no performance hit on the bit31 clear regions.
But if your code is accessing the same location in the memory, sometimes with bit31 set and sometimes clear then you must flush the cache in between to ensure it works correctly.
ps. The 256MB limit is imposed by the call instruction, which can only jump within the same 256MB region.