Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
8 years ago

Kernel performance with profiling

Hello again guys.

Im struggle to understand the results from the profiler of two kernel versions (one with unroll factor of 128 and another with 32)

The 32 unroll factor outperforms the 128 factor by 5 seconds for an input matrix of 20000 x 1000.

Stats are:

32 | 128

Activity: 96% | 25%

Memory(global) BW: 15182 MB/s | 11885 MB/s

Kernel Clock Freq: 244 MHz | 185 MHZ

Stall %: 14,49% | 15,1 %

I don't know what is happening because the stall increases while the best version has better memory bw..

15 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    If you are talking about the number reported by the profiler, I think it is in 32-bit words, not bytes.

    --- Quote End ---

    On profiler reports says that the max burst size is 16.. but 16 bits, bytes, number of requests?

    bandwidth

    Burst Size

    The average burst size of the memory operation.If the memory system does not support burst mode (for example, on-chip RAM), no burst information will be available.

    Average Burst Size=7.6(Max Burst=16)

    Global memory

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Since the maximum read/write size per kernel clock cycle per DDR3/4 memory bank is 512-bits (standard value hardcoded in the BSP), for a total of 1024 bits for two banks, I have so far assumed that a burst size of 16 equals a read/write size of 1024 bits, hence the unit being 32-bit words. I could be wrong, though.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    Since the maximum read/write size per kernel clock cycle per DDR3/4 memory bank is 512-bits (standard value hardcoded in the BSP), for a total of 1024 bits for two banks, I have so far assumed that a burst size of 16 equals a read/write size of 1024 bits, hence the unit being 32-bit words. I could be wrong, though.

    --- Quote End ---

    Thats seems logic :) Last question (for now i guess), im using the board de5_net (http://www.terasic.com.tw/cgi-bin/page/archive.pl?language=english&categoryno=158&no=526&partno=2)

    However, when i go to the BSP board spec, it says that i only have 2 GB for each bank (im assuming each dimm socket its a bank).. My question is, my frequency memory clock for DDR3 its 933 MHz?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The standard DE5-Net board has two 2GB DDR3 banks, each running at 1600 MHz (800 MHz double datarate). There is another variation of the board with two 4GB DDR3 banks, but I am not sure how fast the memory is (probably the same speed as the 2GB one). The first variation must be the board you have.

  • ADua0's avatar
    ADua0
    Icon for Occasional Contributor rankOccasional Contributor

    I am asking this question because I think this might be related. In profiler report I get average read burst size of 1 and average write burst size of 1 with optimal possible of 16. I am not sure how to utilize that burst size . Any suggestion to achieve that will really be helpful.

    Just to give more information, I am reading 16 data at a time and storing in shift register