Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
10 years ago

Work-group size and logic utilization

Hi,

Currently I am doing some experiments with matrix-XOR kernel (similar with altera matrix-multiplication example, just change the multiplication operation to bit-wise exclusive-or). In the code the loop is fully unrolled. I find the work-group size setting has a tremendous affect on logic utilization report.

For example, if the work-group size is set as (64, 64, 1), the logic utilization shown in report is 16%. And when the work group size is (128,128,1), the logic utilization will be 46% which is easy to understand since more bit-wise exclusive-or operations are done in the fully unrolled loop. However when I change the work group size to (80,80,1), the logic utilization will be increase to 123%, which I cannot understand.

Can anyone give some suggestions or recommendations about this phenomenon? Does it mean the compiler prefer work-group size value as power of 2?

Thanks.

2 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    My guess is that the optimizer fails to do a good job with the size of (80, 80). Can the problem possibly be simplified for powers of two? Have you tried to implement the problem as a single work item kernel? Those tend to be more efficient and the compiler is more predictable.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    My guess is that the optimizer fails to do a good job with the size of (80, 80). Can the problem possibly be simplified for powers of two? Have you tried to implement the problem as a single work item kernel? Those tend to be more efficient and the compiler is more predictable.

    --- Quote End ---

    Thanks for the reply. Actually I want to know if it is OK to construct a local memory (has the same size with work group) whose size is not powers of two. E.g when setting the SIMD as 8 for matrix XOR kernel, a 128 * 128 local memory per work group will use more than 100% memory blocks on FPGA. So I want to know if it is possible to use a 80 * 80 local memory while maintaining SIMD as 8 to utilize more memory blocks on FPGA (but less than 100%)