Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
9 years ago

I want to calculate my kernel ideal execution time with fmax.

This is my acl_quartus_report.txt

/////////////////////////////////////////////////////////

ALUTs: 7794

Registers: 9,641

Logic utilization: 5,368 / 32,070 ( 17 % ) ( 16 % )

I/O pins: 103 / 457 ( 23 % )

DSP blocks: 0 / 87 ( 0 % )

Memory bits: 348,224 / 4,065,280 ( 9 % )

M10K blocks: 63 / 397 ( 16 % )

Actual clock freq: 135.639999807

Kernel fmax: 135.64

1x clock fmax: 135.64

2x clock fmax: 10000

Highest non-global fanout: 2723

//////////////////////////////////////////////////////////

1)

I can see that Kernel fmax is 135.64.

Does this mean that fmax is 135.64MHz???

2)

my kernel make 350036 elements and work.

if I caculate kernel ideal execution time except memory load/write delay.

That is 1/135.64MHz * 350056 = 2.58ms???

Thanks,

3 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The fmax is the highest frequency the design could be run at. The actual used frequency depends on your hardware, on the oscillator you are using and on your pll settings, if you use plls.

    I don't understand "my kernel make 350036 elements and work.". If your design uses 350036 clock cycles to compute a result, then yes the execution time will be 1/f*350036.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    2)

    If you mean that you run the kernel on 350056 work-items, then to know the ideal execution time you would need to know how many work-items come out of the pipeline per clock. It might not necessarily be one depending on the kernel structure and instructions, in fact it will most probably be lower. I don't have any report on hand right now, but I believe with the profiler you are able to get that information.

    Also, you would need to know the generated pipeline depth and calculate the time it takes for the first work-unit to get processed.

    All in all, maybe something like this would do:

    1/135.64MHz * 350056 / clockcycles_per_work-unit + pipeline_processing_time

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I am grateful for your replys.

    Yes, 350056 means 350036 work-items.

    And I don't know my pipeline depth of my kernel.

    Can you teach me how do I know pipeline depth???

    thanks.