Having only one 512-bit burst coalesced aligned memory access in your kernel running at 300 MHz with interleaving disabled should give you ~95% of theoretical peak bandwidth. This is certainly not a realistic case for majority of designs but the memory controller is so primitive, any other case will give you lower (probably much lower) memory performance.
The discussions in this thread might also help you:
https://forums.intel.com/s/question/0D50P00003yyTK3SAM/global-memory-access-512-bit-width-constrain?language=en_US
There are more discussions on the issue of memory bandwidth if you search the forum.