How OpenCL synthesizes hardware on FPGA

Honored Contributor

8 years ago

In the specific case of vector-add, whether the kernel is NDRange or single work-item, the compiler will create one adder and three ports to global memory (two reads and one write), plus some buffers between global memory and the kernel to absorb possible stalls and some registers to allow pipelining. In this case, 2N values will be read from global memory, and N values will be written, with three values being read/written per clock. This will obviously result in poor performance; hence, SIMD (for NDRange kernels) and unrolling (for single work-item) can be used to increase the number of adders that are synthesized, and widen the ports to memory, to allow more data to be loaded and added per clock to improve performance.

Forum Discussion

Recent Discussions

Timing analysis - long combinational path

Docker image for Quartus Pro 26.1 missing ?

Error (292014): Can't find valid feature line for core SLL_CA_HBC_T001_Hyperbus_Memory_Controller_10

Agilex 5 – Critical HSSI Error in JESD204B Example Design

The quartus license works with version 25.0 but not with version 17.0