Forum Discussion

GRodr25's avatar
GRodr25
Icon for New Contributor rankNew Contributor
5 years ago
Solved

Load/store cannot be vectorised - local memory

Hello, I'm having some trouble with local memory and SIMD in a matrix transpose kernel I'm adapting from GPU. The code: #define TILE_DIM 4 __attribute__((reqd_work_group_size(TILE_DIM, TILE_DIM,...
  • HRZ's avatar
    5 years ago

    That compiler warning in particular is a very misleading warning and it does not always point to an actual problem in your code. Looking at the report, both the load from and the store to global memory are coalesced into 128-bit accesses which points to correct vectorization. The local buffer "tile" is also replicated by 28 times to provide fully-parallel non-stallable accesses. 4 times of it is because your code has 4 non-coalescable reads on line 28, and one coalescable write on line 19 (each Block RAM has two ports, writes are connected to all replicas while reads are connected to one, resulting in a replication factor of 4 for 4 reads and one write). The buffer is also replicated by 7 extra times to support 7 work-groups running concurrently in the same compute unit; this latter replication factor is a compiler decision that cannot be overridden by the user. All in all there is nothing wrong with your code and I would say you can safely ignore the warning.