NRrange size, offsets and workgroup size

Honored Contributor

12 years ago

Since the underlining hardware is flexible tuning the work-group size in the kernel file has more to do with efficiency (hardware footprint and compiler optimizations). If the kernel compiler knows the work-group size in advanced it can make sure only the hardware you need will be created.

So if you know what the maximum work-group size is ahead of time I recommend specifying the max_work_group_size attribute because if you don't the compiler will generate hardware to handle a 256 work-group size which might be overkill in terms of hardware. The reqd_work_group_size attribute has the same benefits as the max_work_group_size attribute except it makes sure the hardware footprint is set in stone which sometimes give additional footprint reduction as well as it gives the compiler more information to perform more agressive optimizations.

It's fairly difficult to give recommendations on work-group size because it's very algorithm dependent what the work-group size has on the underlining hardware. One thing to keep in mind is the hardware is flexible so you are not limited to trying to code your kernel to match the archeticture so experimentation is probably necessary. My recommendation is to make things like the work-group size configurable in the kernel using macros and try different sizes by compiling the kernel using the -c option so that just the accelerator gets generated instead of the final programming file which is the time consuming part of the compilation flow. That way you can sweep different sizes to see which one works best, you'll want to generate the reports as well so that you can keep track if the metrics improved or not by passing in the --report and --estimate-throughput flags. Often when I'm tuning a kernel I make the work-group size configurable through a macro then modify the macro by passing it's value to the compiler using the flag "-D <MACRO_NAME>=<MACRO_VALUE>".

Forum Discussion

Recent Discussions

Quartus 20.1std compilation fails for Quartus map - Device 10AS057K2F40I1SG

Is Quartus Prime Pro 22.4 Compatible with Stratix 10 NX Series Device?

Timing analysis - long combinational path

QuartusPro 25.3 Crashed after using the Signal Tap Logic Analyzer

Duplicate_hierarchy_depth / duplicate_register