Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
7 years ago

Compute units on emulator

Hi,

I would like to say if the emulator considers the use of multiple compute units. I do not have a physical board for working, so if I had more than one compute unit, how does the emulator work? Thanks for your help

7 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Can you please clarify what you mean by "multiple compute units"?

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Yes, of course. I mean the use of more than one compute unit with command "num_compute_unit(N)". In this way, I should be able to compute different Work-Groups simultaneously. I do not know if it is possible with the emulator. Thanks

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    num_compute_unit(N) is a basic compiler feature for NDRange kernels and it is fully supported in the emulator. The users does not need to change anything in the host or kernel code, other than adding the associated attribute. The compiler will automatically handle pipeline replication and distribution of work-groups over the multiple compute units. Functionally, the emulator behaves in the exact same way as the actual hardware does. However, you should not expect to see a speed-up in the emulator by using this attribute, since the emulator is not timing-accurate.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Ok thanks but when I open report.html to see the details about the kernel, in the compute unit section the number of CUs is always 1 even though I use the attribute to increase them. I refer to Vector Add example in which I used more than one work-group. Should this number change according to the number of CUs I use?

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I think it should. However, the vector_add example only uses one work-group and hence (no get_local_id() in the kernel), there is no point in using num_compute_units. I guess that is the reason why the number of CUs does not increase in the report. Though, for some reason, the area utilization goes up when you increase the number of CUs, which means the compiler is changing the circuit.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The CL_DEVICE_MAX_COMPUTE_UNITS value reported by the emulator or the FPGA itself is always "1", regardless of how many compute units you have in your kernel. This value depends on the characteristics of the OpenCL device (and not the kernel running on it) and it has no meaning in the particular case of FPGAs, since these devices do not have a fixed architecture.