Forum Discussion

MAsla5's avatar
MAsla5
Icon for New Contributor rankNew Contributor
6 years ago

What does Performance saturation mean when we Increase SIMD size?

Hi,

I am accelerating my application on altera FPGA, When I go with SIMD 32 the resources drops apart of increasing. I studied somewhere that its a performance saturation. My question is, how to prove it? Where could i find the answer of this question? Could i find in report somewhere?

Thank you.

10 Replies

  • MEIYAN_L_Intel's avatar
    MEIYAN_L_Intel
    Icon for Frequent Contributor rankFrequent Contributor

    Hi,

    May I know the "When I go with SIMD 32 the resources drops apart of increasing", is mean by you are incresing the SIMD work item to 32, am I right?

    Thanks

  • MAsla5's avatar
    MAsla5
    Icon for New Contributor rankNew Contributor

    Hi,

    Yes, you're right, Usually, if resources are used more than hundred percent , then offline compilation is terminated but what happens in case of SIMD 32?

    Thanks!

  • MAsla5's avatar
    MAsla5
    Icon for New Contributor rankNew Contributor

    Hi,

    When is see reports, there are only two memory banks created . In case of 16, there are only 16 memory banks that i can see in reports.

    Is there any memory bound issue? If it is, then what it is? Please guide me in this matter.

    Thanks!

  • MEIYAN_L_Intel's avatar
    MEIYAN_L_Intel
    Icon for Frequent Contributor rankFrequent Contributor

    Hi,

    May I have the kernel code and report file for further investigate?

    Thanks

    • MAsla5's avatar
      MAsla5
      Icon for New Contributor rankNew Contributor

      Hi ,

      You can feel the same bevaior with intel DESIGN example of matrix multiplication. I have checked with that as well.

      Thanks!

    • HRZ's avatar
      HRZ
      Icon for Frequent Contributor rankFrequent Contributor

      That is because the compiler does not support SIMD sizes above 16 and if you choose such SIMD size, it will automatically revert to a SIMD size of 1 and hence, resource utilization will decrease. There should be warning about this in the compilation log, or at least there was one before. A lot of the important warning have been removed in the newer versions of the compiler, hope this one is still there.

      Of course there is zero logical reason to have any restriction on SIMD size for FPGAs since, unlike GPUs, FPGAs do not have a fixed architecture; however, this has been like this since the very first version of the compiler and will probably never change.

      • HRZ's avatar
        HRZ
        Icon for Frequent Contributor rankFrequent Contributor

        Not really, this has nothing to do with memory bandwidth, this is an artificial compiler limitation. The following compiler warning is generated when compiling your kernel:

        Compiler Warning: Kernel Vectorization: requested number of SIMD work items is larger than  ... cannot vectorize efficiently beyond OpenCL widest vector type.

        If you write the kernel using the Single Work-item model and use an unroll size of 32, which would have a similar effect to using a SIMD size of 32 in an NDRange kernel, the kernel will compile just fine and the area usage will keep increasing as you increase the unroll factor. Depending on your kernel and FPGA size, you might not be able to fit the design with literally any SIMD size (even 1), or you might be able to still fit it with a hypothetical SIMD size of 32 or more. The compiler cannot know if your design will fit or not without place and routing it; hence, it will not terminate the compilation if some resource is expected to be overutilized. Note that the area utilization numbers you get from the "-report" switch are based on estimation, and final area utilization could be more or less than that.

        Memory bandwidth depends on a lot of factors, only one of which is SIMD/unroll size. You can find a comprehensive analysis of memory performance on Intel FPGAs in the following document:

        https://arxiv.org/abs/1910.06726