Forum Discussion
10 Replies
- MEIYAN_L_Intel
Frequent Contributor
Hi,
May I know the "When I go with SIMD 32 the resources drops apart of increasing", is mean by you are incresing the SIMD work item to 32, am I right?
Thanks
- MAsla5
New Contributor
Hi,
Yes, you're right, Usually, if resources are used more than hundred percent , then offline compilation is terminated but what happens in case of SIMD 32?
Thanks!
- MAsla5
New Contributor
Hi,
When is see reports, there are only two memory banks created . In case of 16, there are only 16 memory banks that i can see in reports.
Is there any memory bound issue? If it is, then what it is? Please guide me in this matter.
Thanks!
- MEIYAN_L_Intel
Frequent Contributor
Hi,
May I have the kernel code and report file for further investigate?
Thanks
- MAsla5
New Contributor
Hi ,
You can feel the same bevaior with intel DESIGN example of matrix multiplication. I have checked with that as well.
Thanks!
- MAsla5
New Contributor
- HRZ
Frequent Contributor
That is because the compiler does not support SIMD sizes above 16 and if you choose such SIMD size, it will automatically revert to a SIMD size of 1 and hence, resource utilization will decrease. There should be warning about this in the compilation log, or at least there was one before. A lot of the important warning have been removed in the newer versions of the compiler, hope this one is still there.
Of course there is zero logical reason to have any restriction on SIMD size for FPGAs since, unlike GPUs, FPGAs do not have a fixed architecture; however, this has been like this since the very first version of the compiler and will probably never change.
- HRZ
Frequent Contributor
Not really, this has nothing to do with memory bandwidth, this is an artificial compiler limitation. The following compiler warning is generated when compiling your kernel:
Compiler Warning: Kernel Vectorization: requested number of SIMD work items is larger than ... cannot vectorize efficiently beyond OpenCL widest vector type.If you write the kernel using the Single Work-item model and use an unroll size of 32, which would have a similar effect to using a SIMD size of 32 in an NDRange kernel, the kernel will compile just fine and the area usage will keep increasing as you increase the unroll factor. Depending on your kernel and FPGA size, you might not be able to fit the design with literally any SIMD size (even 1), or you might be able to still fit it with a hypothetical SIMD size of 32 or more. The compiler cannot know if your design will fit or not without place and routing it; hence, it will not terminate the compilation if some resource is expected to be overutilized. Note that the area utilization numbers you get from the "-report" switch are based on estimation, and final area utilization could be more or less than that.
Memory bandwidth depends on a lot of factors, only one of which is SIMD/unroll size. You can find a comprehensive analysis of memory performance on Intel FPGAs in the following document:
- MEIYAN_L_Intel
Frequent Contributor
Hi,
For your information, according to https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/opencl-sdk/aocl-best-practices-guide.pdf in chapter 7.3.1 shows the limitation in implement num_simd_work_items attribute.
Thanks