Forum Discussion
Altera_Forum
Honored Contributor
8 years agoThere is no documentation on this, but this can be calculated by taking the kernel and memory operating frequency and the width of the memory bus into account.
The bus width per bank for DDR3 and DDR4 memory is 64 bits (72 bits with ECC). The memory operating frequency for DDR3 is generally 1600 MHz (800 MHz double data-rate), and for DDR4 is 2133 MHz (1066 MHz double data-rate). The memory controller on the FPGA runs at 1/8 of the clock of the external memory for these two memory types. Hence, the memory bandwidth will be saturated if the kernel is running at the same operating frequency as the memory controller, but the accesses are at least 8 times larger than the width of the memory bus. To this end, for 2 banks of DDR4 memory running at 2133 MHz, a kernel running at 266 MHz with a total access size of bus_width x number_of_banks x (memory_frequency/kernel_frequency) = 64 x 2 x 8 = 1024 bits per clock will saturate the memory bandwidth. Of course this will only happen in a 100% ideal scenario: - Your kernel has only one read and one write, both of which are 512 bits wide - The number of reads and writes are the same - The read and write buffers are manually partitioned into the two banks - Memory accesses are all aligned - The kernel is running exactly at 266 MHz - There is no other source of stalling in your kernel (e.g. channels) In practice, you might still get marginal performance improvement with a total access size of larger than 1024 bits per clock, since the memory controller is far from perfect, but if you have multiple access ports to memory, performance is more likely to go down with larger accesses, rather than up. At the end of the day, you should try to minimize the number of access ports to external memory, but maximize the size of the accesses.