Forum Discussion
I am sure with 18.1.x, hyperflex optimization was automatically disabled with burst non-aligned LSUs. I never tested 19.0 or 19.1, but in 19.2 and above this does not seem to happen anymore. You can infer aligned coalesced ports if you avoid access coalescing using loop unrolling and instead use OpenCL vector variables or a struct with one array as its member with as many indexes as the unroll factor, so that you are technically just reading (and writing) one wide value each loop iteration. Of course this will only be feasible if all your memory accesses have a minimum alignment size equal to the width of the coalesced port (i.e. your offset in bytes should always be a multiple of the port size in bytes).
This happens with 19.2 as well, or at least it returns the same warning message an disable Hyper Optimization.
Regarding your suggestion of using vector variables (or custom data types), I can see that this will work, but this will render more difficult to handle the case in which the sizes of the matrix are not a multiple of the used vector data type.