Forum Discussion
Altera_Forum
Honored Contributor
12 years agoThat's correct. All of the example designs up on the Altera site have readme files and most of them require additional flags to improve the kernel that is output. In the case of the 2D FFT the memory is being partitioned instead of the default interleaved behavior (--sw-dimm-partition) and the floating point math hardware is optimized to avoid intermediate rounding operations (often called "fused math") that consume additional hardware (--fpc)
Was the -O3 the only flag you passed in? I suspect I know what happened but we should have issued a more user friendly message in that case so I would like to reproduce this on my end. That example uses a mix of NDRange and single work-item execution (task) kernels so I think by passing -O3 in the compiler attempted to optimize that task like it was an NDRange kernel. In general I would avoid using the -O3 optimization when a task kernel is involved because there is no opportunity for it to be optimized further since tasks only operate on a single work-item so there is no opportunity for the compiler to throw more hardware at the kernel to improve the performance.