I have generated altera_emif IP with the following parameters:Protocol : DDR4Target Device: Arria10Memory Clock frequency : 1200 MHzClock Rate of user logic: QuarterUser logic clock: 300 MHzDQ Width : 32 bitsamm_readdata and amm_writedata : 256 bitsThe above configuration summarizes to the following statements:FPGA Receives 64 bits at from DDR4 at 1200 MHz at every clock (32 bits in positive edge and 32 bits in negative edge)Avalon interface works at 300 MHz (quarter rate)Avalon interface sends out 256 bits data (32*8) at 300 MHz at every clock.Bandwidth = 1200 * 1000000 (MHz) * 2 * 32 / (10^9) = 76.8 Giga bits per second.Is my understanding correct?Please Confirm.

Hi,Your understanding on all questions 1 to 4 are correct.One thing to take note is whatever bandwidth calculation that we discussed so far is "theoretical max bandwidth" Actual data transfer throughput may vary depending on following factorWhether user design application is able to process and transfer data on every clock cycle or is user executing sequence or random SDRAM address accessingIt's impossible for DDR4 IP controller to process data transfer every clock cycle. DDR4 IP will gate avalon_ready signal if it's busy and unable to accept data transfer It's impossible for DDR4 SDRAM to accept data transfer every clock cycle due to internal write/read timing switch requirement and also SDRAM refresh cycle requirementThanks.Regards,dlim

SKGR0

New Contributor

6 years ago

Solved

DDR4 Memory Access with Tartget Device - Arria10

I have generated altera_emif IP with the following parameters:

Protocol : DDR4
Target Device: Arria10
Memory Clock frequency : 1200 MHz
Clock Rate of user logic: Quarter
User logic clock: 300 MHz
DQ Width : 32 bits
amm_readdata and amm_writedata : 256 bits

The above configuration summarizes to the following statements:

FPGA Receives 64 bits at from DDR4 at 1200 MHz at every clock (32 bits in positive edge and 32 bits in negative edge)
Avalon interface works at 300 MHz (quarter rate)
Avalon interface sends out 256 bits data (32*8) at 300 MHz at every clock.
Bandwidth = 1200 * 1000000 (MHz) * 2 * 32 / (10^9) = 76.8 Giga bits per second.

Is my understanding correct?Please Confirm.

Deshi_Intel
6 years ago
Hi,
Your understanding on all questions 1 to 4 are correct.
One thing to take note is whatever bandwidth calculation that we discussed so far is "theoretical max bandwidth"
Actual data transfer throughput may vary depending on following factor
Whether user design application is able to process and transfer data on every clock cycle or is user executing sequence or random SDRAM address accessing
It's impossible for DDR4 IP controller to process data transfer every clock cycle. DDR4 IP will gate avalon_ready signal if it's busy and unable to accept data transfer
It's impossible for DDR4 SDRAM to accept data transfer every clock cycle due to internal write/read timing switch requirement and also SDRAM refresh cycle requirement
Thanks.
Regards,
dlim

8 Replies

Deshi_Intel
Regular Contributor
6 years ago
Hi,
Your understanding on all questions 1 to 4 are correct.
One thing to take note is whatever bandwidth calculation that we discussed so far is "theoretical max bandwidth"
Actual data transfer throughput may vary depending on following factor
Whether user design application is able to process and transfer data on every clock cycle or is user executing sequence or random SDRAM address accessing
It's impossible for DDR4 IP controller to process data transfer every clock cycle. DDR4 IP will gate avalon_ready signal if it's busy and unable to accept data transfer
It's impossible for DDR4 SDRAM to accept data transfer every clock cycle due to internal write/read timing switch requirement and also SDRAM refresh cycle requirement
Thanks.
Regards,
dlim
- JET60200
  Contributor
  6 years ago
  Is there any AMM DMA Linux Driver Example on Host Side ? I don't find any . Thanks a lot
- SKGR0
  New Contributor
  6 years ago
  Thanks!!
  In addition to the above query,
  I observed that DDR4 limits the burst length to 8 (BL8)
  Does this mean , if DQ Width is 32 , with one DDR read request I would be able to receive maximum of 256 bits (32 *8) ?
Deshi_Intel
Regular Contributor
6 years ago
HI,
There are 2 sides of data transaction flow as below.
User logic <=> DDR4 IP <=> DDR4 SDRAM
BL8 is applicable for the data transaction between DDR4 IP <=> DDR4 SDRAM which is defined by JEDEC spec.
I believed the higher burstcount is happening on example design data flow between User logic <=> DDR4 IP, right ?
User can blast a lot of data to DDR4 IP but it will be queue and process accordingly inside the DDR4 IP to be transferred to DDR4 SDRAM later with BL8.
I hope I clear your doubt. Thanks.
Regards,
dlim
- SKGR0
  New Contributor
  6 years ago
  Hi
  Thanks , it is clear now.
  Further calculating the DDR4 latency.
  Time taken between raising the read request and retrieving the the first word from Memory is
  Latency = CAS Latency/ Memory clock speed * (2000) nanoseconds
  example: for DDR4 - 2400, Clock speed - 1200MHz , if CL = 15
  Latency = (1200/15)*2000 = 25 nanoseconds
  My question is :
  If I request a burst count of 32 (4 *BL8) , What would be the total latency to receive the data ?
  Is it , 4 (BL8) Read requests * 25 = 100 nanoseconds ?
  Or , 1 Read request * 25 = 25 nanoseconds?
  Thanks in advance!
Deshi_Intel
Regular Contributor
6 years ago
HI,
Sorry, Intel FPGA doesn't have DMA linux driver example as we are just DDR4 IP memory controller solution provider rather than system level application solution provider.
For your enquiry on burst length of 8,
Yes, one read request on burst length of 8 will transfer total of 256 bit data (32 x 8)
But do take note this whole process happen over 4 clock cycle, each clock cycle transfer 2 times of data (rising edge + falling edge)
Each burst only transfer 32 bits of data where 256 bits data transfer is achieved via 8 times of data transfer using only one read command
Thanks.
Regards,
dlim
- SKGR0
  New Contributor
  6 years ago
  Previous query :
  I observed that DDR4 limits the burst length to 8 (BL8)
  Does this mean , if DQ Width is 32 , with one DDR read request I would be able to receive maximum of 256 bits (32 *8) ?
  Further on enquiry on burst length of 8
  I tried instantiating a DDR4 Controller IP for Arria 10 device and simulated the example design.
  I found that amm_burstcount = 58 in the example design.
  And this contradicts with the statement that the DDR4 IP constraints the burst length to 8 (Fixed BL8).
  Can someone please clarify on this?
  Thanks in advance!
Deshi_Intel
Regular Contributor
6 years ago
HI,
For estimated latency, you can refer to A10 EMIF user guide doc (page 418, table 394)
https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug-20115.pdf
Thanks.
Regards,
dlim

Forum Discussion

DDR4 Memory Access with Tartget Device - Arria10

8 Replies

Recent Discussions

Agilex 5 reconfigurable PLL - emif

Regarding data for the Altera Arria V GX FPGA development kit

AXC3000 Agilex 3 board

Access to System MAX design for Agilex 5 kit

Agilex7 m-series for llama