Forum Discussion

DongWang-BJTU's avatar
DongWang-BJTU
Icon for Occasional Contributor rankOccasional Contributor
7 years ago

Performance difference between OpenCL 18.1 Std and Pro for FPGA ?

I was compiling the same kernel code by using both v18.1 std and pro. For the standard version, I could achieve a fmax around 220 MHz, but for pro version, the fmax is only 190 MHz.

I further compared the report.html, and I found that loop dependence is found in the pro version report, but in the standard version everything is OK.

This does not happen when I was using v18.0 and older versions. What have changed in v18.1 ?

7 Replies

  • Hi, Is there any specific example you are using for comparison ? If the example is from OpenCL examples provided by Intel, I can try it out on my end. Thanks, Arslan
  • Hi,

    Is there any specific example you are using for comparison ?

    If the example is from OpenCL examples provided by Intel, I can try it out on my end.

    Thanks,

    Arslan

    • DongWang-BJTU's avatar
      DongWang-BJTU
      Icon for Occasional Contributor rankOccasional Contributor

      Sorry, I can not post the whole kernel code here, too many lines. Here's some results that can be seen directly:

      For 18.1 std, the following code are succefully pipelined with II=1, with no warning:

      But for 18.1 Pro, a fmax warning is shown in report.html as can be seen here:

      dependency is found on variable find_idle_ch_id here:

      For this reason, a fmax=190MHz is generated for a10 device, while for s5 device the fmax is 220MHz.

      For my understanding, a10 is more a advanced device than s5, and should run higher frequence than stratix-v.

      • Dr_FPGA's avatar
        Dr_FPGA
        Icon for New Contributor rankNew Contributor

        Keep in mind that S5 was the "top of the line" FPGA not so long ago. I think the reason for Pro version existence are different Gen10 devices with diffent metal routing and I/O columns in the middle of the die. Routing across I/O columns and around congested areas typically the main reasons for extra tPD and lower frequency in A10 vs S5.

    • DongWang-BJTU's avatar
      DongWang-BJTU
      Icon for Occasional Contributor rankOccasional Contributor

      Another odd thing is that sometimes 18.1 Pro generates unreasonable registers for private variables as follow:

      The variable table_p2s_prefechtor is actually 16-bit width (unsigned short), but the compiler make it 512-bit wide, this makes feedback logics in-efficient.

      For 18.1 std version, there is no problem:

  • Could you give it a try with latest OpenCL compiler release 19.1, if problem persists you may share the kernel codes and steps to replicate the issue in private message and I can help to feedback this to Engineering team. Thanks, Arslan