Load/store cannot be vectorised - local memory
- 5 years ago
That compiler warning in particular is a very misleading warning and it does not always point to an actual problem in your code. Looking at the report, both the load from and the store to global memory are coalesced into 128-bit accesses which points to correct vectorization. The local buffer "tile" is also replicated by 28 times to provide fully-parallel non-stallable accesses. 4 times of it is because your code has 4 non-coalescable reads on line 28, and one coalescable write on line 19 (each Block RAM has two ports, writes are connected to all replicas while reads are connected to one, resulting in a replication factor of 4 for 4 reads and one write). The buffer is also replicated by 7 extra times to support 7 work-groups running concurrently in the same compute unit; this latter replication factor is a compiler decision that cannot be overridden by the user. All in all there is nothing wrong with your code and I would say you can safely ignore the warning.