Forum Discussion
Altera_Forum
Honored Contributor
10 years agoThe kernel reads in data from the global memory and stores it in local memory and performs the matrix multiply on the blocks of data it pulls into the local memory. If you use the default matmult application they provided, the default block size is 64. With the required work group size set to 64x64 = 4096 total work items for one work group. That's my guess. There's a lot of data movement depending on how big you set the block size.