Improvement of self-written OpenCL-Funktion (GaussianBlur)

Honored Contributor

8 years ago

Comparing with the ARM core is probably not very conclusive since the ARM core is extremely slow.

The most obvious way to increase performance on the FPGA would be to unroll the loop on "c". Though since you are performing a floating-point reduction, you should either fully unroll that loop, or first optimize that loop to achieve an iteration interval of one by inferring a shift register as outlined in "Intel® FPGA SDK for OpenCL Best Practices Guide, 1.6.1.5 Removing Loop-Carried Dependency by Inferring Shift Registers" and then unroll it to achieve best performance.

You should consider fully reading Intel's programming and best practices guides since all the basic optimization techniques are covered there.

Forum Discussion

Improvement of self-written OpenCL-Funktion (GaussianBlur)

Recent Discussions

starting to learn FPGAs

qsys-generate outputs Info as Error

Timing analysis - long combinational path

Quartus Prime Lite 25.1 License Error - "Unable to checkout a license" (SALT_LICENSE_SERVER)

Regarding the issue of UFM not starting