Strange behavior of Quartus Fitter and how to get more information
Hi,
I'm designing an accelerator for DTW computation using oneAPI and Stratix 10 at the board BittWare 520N-MX Gen3x16. I have a kernel (it's actually several different kernels connected with pipes) that I replicate as many as possible to get the maximum throughput. The different kernel entities work with different input data.
In one of the versions, I fitted 12 kernels in the FPGA. Then for that kernel, I simplify the external memory interfaces and the "function overhead" ( using oneAPI pragmas). The compile estimated resource utilization shows a reduction of more than 30% per kernel. However, Fitter failed to place more than 12 kernels on the FPGA. What sounds even more strange to me is that if I try to compile 16 kernels I get the error:
"Error (170012): Fitter requires 72611 LABs to implement the design, but the device contains only 66099 LABs."
But, If I try to compile 14 kernels (same clock target)
"Error (170012): Fitter requires 73646 LABs to implement the design, but the device contains only 66439 LABs"
How could 14 identical kernels need more LABs than 16?
I have tried other numbers of kernels and clock frequency and the results are very unpredictable.
Any idea of why the estimation of resource utilization is so wrong? How can I get more information on the fitter process to try to figure out what is happening?
Thanks.