Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
10 years ago

Matrix multiplication example Block memory overhead

Hello Everyone

I am a quite newbie for Altera OpenCL. Recently I tried compiling Matrix Multiplication example which is given in the opencl design examples page (https://www.altera.com/support/support-resources/design-examples/design-software/opencl.html)

For my surprise, the block memory bits usage is very high. As I explored in Quartus, the most of the BlockRAM bits were used by FIFOs and LSUs(load store units).

May I have any help to understand, why the compilation generates such FIFOs and LSUs? I could not find any reference which explains the reasons behind FIFO and LSU generation.

Any guidance is really appreciated.

2 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The kernel reads in data from the global memory and stores it in local memory and performs the matrix multiply on the blocks of data it pulls into the local memory. If you use the default matmult application they provided, the default block size is 64. With the required work group size set to 64x64 = 4096 total work items for one work group. That's my guess. There's a lot of data movement depending on how big you set the block size.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    i agree with what okebz said 。 FIFO use to store the variables ,and LSU used to read the data from global memory or write the data into the global memory 。