Forum Discussion
Altera_Forum
Honored Contributor
11 years ago --- Quote Start --- Yes, with simd_work_item(2), the compiler generates the same number of process elements (i.e. 1), however, each processing element is wider to do more work. The "num_compute_units" attribute directly replicates each processing element. --- Quote End --- I understand you now. I have another question then. If I didn't specify the " num_compute_units" attribute, does it mean the compiler will generate one compute unit for my kernel design? And for the following example, with BLOCK_SIZE=32,SIMD_WORK_ITEMS=2 __attribute((reqd_work_group_size(64,64,1))) __attribute((num_simd_work_items(SIMD_WORK_ITEMS))) __attribute((num_compute_units(1))) does it mean that the runtime will manage to schedule a problem size with 64*64 to one compute_unit on the device, with that compute_unit has 32*16 processing elements? Thanks! -Rae