Work-group size and logic utilization

Question

Hi,

Currently I am doing some experiments with matrix-XOR kernel (similar with altera matrix-multiplication example, just change the multiplication operation to bit-wise exclusive-or). In the code the loop is fully unrolled. I find the work-group size setting has a tremendous affect on logic utilization report.

For example, if the work-group size is set as (64, 64, 1), the logic utilization shown in report is 16%. And when the work group size is (128,128,1), the logic utilization will be 46% which is easy to understand since more bit-wise exclusive-or operations are done in the fully unrolled loop. However when I change the work group size to (80,80,1), the logic utilization will be increase to 123%, which I cannot understand.

Can anyone give some suggestions or recommendations about this phenomenon? Does it mean the compiler prefer work-group size value as power of 2?

Thanks.

altera_forum · Answer

My guess is that the optimizer fails to do a good job with the size of (80, 80). Can the problem possibly be simplified for powers of two? Have you tried to implement the problem as a single work item kernel? Those tend to be more efficient and the compiler is more predictable.

altera_forum · Answer

--- Quote Start ---

My guess is that the optimizer fails to do a good job with the size of (80, 80). Can the problem possibly be simplified for powers of two? Have you tried to implement the problem as a single work item kernel? Those tend to be more efficient and the compiler is more predictable.

--- Quote End ---

Thanks for the reply. Actually I want to know if it is OK to construct a local memory (has the same size with work group) whose size is not powers of two. E.g when setting the SIMD as 8 for matrix XOR kernel, a 128 * 128 local memory per work group will use more than 100% memory blocks on FPGA. So I want to know if it is possible to use a 80 * 80 local memory while maintaining SIMD as 8 to utilize more memory blocks on FPGA (but less than 100%)

Forum Discussion

Work-group size and logic utilization

2 Replies

Recent Discussions

Timing analysis - long combinational path

QuartusPro 25.3 Crashed after using the Signal Tap Logic Analyzer

Duplicate_hierarchy_depth / duplicate_register

Automatically added negative node for TDS output doesn't work with Agilex 5

Quartus 20.1std compilation fails for Quartus map - Device 10AS057K2F40I1SG