Forum Discussion

kkvasan's avatar
kkvasan
Icon for New Contributor rankNew Contributor
4 years ago
Solved

Stratix 10 oneapi: Kernel CLK vs clock 2x

Hi All,
I am new to IntelFPGAs and learning about performance optimizations using oneAPI.
read about HyperFlex routing Optimisation for Stratix 10 FPGAs. It says we can get 2x core performance as it helps the design to operate at a higher frequency. I compiled my design using dpcpp targetting pac_s10_usm acceleration board and when checking the report there are two clocks given. Kernel clock and clock 2x which are 274 MHz and 548 MHz. But when measuring the throughput, it seems the design operates at a kernel clock 274 MHz. Is it possible to run part of the design with clock 2x using oneAPI?

Many Thanks,
Vasan


  • It says we can get 2x core performance as it helps the design to operate at a higher frequency.

    That is purely a PR claim that holds little real value/meaning in practice; hyperflex just provides a modest improvement in Fmax for routing-congested designs because it makes more registers available to the router. 2x improvement will never happen (at least not without changing the actual code, as well).

    The 2x clock that you see in the report is unrelated to hyperflex and is just used for Block RAM double-pumping, which is not actually used by default in Stratix 10 designs anyway and Intel advises against using it for this FPGA family (https://www.intel.com/content/www/us/en/programmable/documentation/mwh1391807516407.html#jhl1520273455239). Your design will always run at the 1x clock and the build process will automatically make sure it runs at the maximum possible frequency by re-routing multiple times based on the worst-case slack of the design when synthesized with an unrestricted clock frequency. Though If you do force Block RAM double-pumping as outlined in Section 10.3 of the documentation I linked above, then only (and only) the Block RAMs in your design will run at the 2x clock, while everything else will still run at the 1x clock.

4 Replies

  • HRZ's avatar
    HRZ
    Icon for Frequent Contributor rankFrequent Contributor

    It says we can get 2x core performance as it helps the design to operate at a higher frequency.

    That is purely a PR claim that holds little real value/meaning in practice; hyperflex just provides a modest improvement in Fmax for routing-congested designs because it makes more registers available to the router. 2x improvement will never happen (at least not without changing the actual code, as well).

    The 2x clock that you see in the report is unrelated to hyperflex and is just used for Block RAM double-pumping, which is not actually used by default in Stratix 10 designs anyway and Intel advises against using it for this FPGA family (https://www.intel.com/content/www/us/en/programmable/documentation/mwh1391807516407.html#jhl1520273455239). Your design will always run at the 1x clock and the build process will automatically make sure it runs at the maximum possible frequency by re-routing multiple times based on the worst-case slack of the design when synthesized with an unrestricted clock frequency. Though If you do force Block RAM double-pumping as outlined in Section 10.3 of the documentation I linked above, then only (and only) the Block RAMs in your design will run at the 2x clock, while everything else will still run at the 1x clock.

  • kkvasan's avatar
    kkvasan
    Icon for New Contributor rankNew Contributor

    Thank you so much @HRZ for the detailed information. would be better if we can double pump DSPs as well.

    Vasan

  • Hi @kkvasan,

    Greetings, there is solution marked in the thread we would assume that doubts has been attended to. Hence thread will no longer be monitored. For new queries, please feel free to open a new thread and we will be right with you. Pleasure having you here.

    Best Wishes
    BB