Forum Discussion

SKGR0's avatar
SKGR0
Icon for New Contributor rankNew Contributor
6 years ago
Solved

DDR4 Memory Access with Tartget Device - Arria10

I have generated altera_emif IP with the following parameters:

  • Protocol : DDR4
  • Target Device: Arria10
  • Memory Clock frequency : 1200 MHz
  • Clock Rate of user logic: Quarter
  • User logic clock: 300 MHz
  • DQ Width : 32 bits
  • amm_readdata and amm_writedata : 256 bits

The above configuration summarizes to the following statements:

  1. FPGA Receives 64 bits at from DDR4 at 1200 MHz at every clock (32 bits in positive edge and 32 bits in negative edge)
  2. Avalon interface works at 300 MHz (quarter rate)
  3. Avalon interface sends out 256 bits data (32*8) at 300 MHz at every clock.
  4. Bandwidth = 1200 * 1000000 (MHz) * 2 * 32 / (10^9) = 76.8 Giga bits per second.

Is my understanding correct?Please Confirm.

  • Hi,

    Your understanding on all questions 1 to 4 are correct.

    One thing to take note is whatever bandwidth calculation that we discussed so far is "theoretical max bandwidth"

    Actual data transfer throughput may vary depending on following factor

    1. Whether user design application is able to process and transfer data on every clock cycle or is user executing sequence or random SDRAM address accessing
    2. It's impossible for DDR4 IP controller to process data transfer every clock cycle. DDR4 IP will gate avalon_ready signal if it's busy and unable to accept data transfer
    3. It's impossible for DDR4 SDRAM to accept data transfer every clock cycle due to internal write/read timing switch requirement and also SDRAM refresh cycle requirement

    Thanks.

    Regards,

    dlim

8 Replies

  • Deshi_Intel's avatar
    Deshi_Intel
    Icon for Regular Contributor rankRegular Contributor

    Hi,

    Your understanding on all questions 1 to 4 are correct.

    One thing to take note is whatever bandwidth calculation that we discussed so far is "theoretical max bandwidth"

    Actual data transfer throughput may vary depending on following factor

    1. Whether user design application is able to process and transfer data on every clock cycle or is user executing sequence or random SDRAM address accessing
    2. It's impossible for DDR4 IP controller to process data transfer every clock cycle. DDR4 IP will gate avalon_ready signal if it's busy and unable to accept data transfer
    3. It's impossible for DDR4 SDRAM to accept data transfer every clock cycle due to internal write/read timing switch requirement and also SDRAM refresh cycle requirement

    Thanks.

    Regards,

    dlim

    • JET60200's avatar
      JET60200
      Icon for Contributor rankContributor

      Is there any AMM DMA Linux Driver Example on Host Side ? I don't find any . Thanks a lot

    • SKGR0's avatar
      SKGR0
      Icon for New Contributor rankNew Contributor

      Thanks!!

      In addition to the above query,

      I observed that DDR4 limits the burst length to 8 (BL8)

      Does this mean , if DQ Width is 32 , with one DDR read request I would be able to receive maximum of 256 bits (32 *8) ?

  • Deshi_Intel's avatar
    Deshi_Intel
    Icon for Regular Contributor rankRegular Contributor

    HI,

    There are 2 sides of data transaction flow as below.

    • User logic <=> DDR4 IP <=> DDR4 SDRAM

    BL8 is applicable for the data transaction between DDR4 IP <=> DDR4 SDRAM which is defined by JEDEC spec.

    I believed the higher burstcount is happening on example design data flow between User logic <=> DDR4 IP, right ?

    User can blast a lot of data to DDR4 IP but it will be queue and process accordingly inside the DDR4 IP to be transferred to DDR4 SDRAM later with BL8.

    I hope I clear your doubt. Thanks.

    Regards,

    dlim

    • SKGR0's avatar
      SKGR0
      Icon for New Contributor rankNew Contributor

      Hi

      Thanks , it is clear now.

      Further calculating the DDR4 latency.

      Time taken between raising the read request and retrieving the the first word from Memory is

      Latency = CAS Latency/ Memory clock speed * (2000) nanoseconds

      example: for DDR4 - 2400, Clock speed - 1200MHz , if CL = 15

      Latency = (1200/15)*2000 = 25 nanoseconds

      My question is :

      If I request a burst count of 32 (4 *BL8) , What would be the total latency to receive the data ?

      Is it , 4 (BL8) Read requests * 25 = 100 nanoseconds ?

      Or , 1 Read request * 25 = 25 nanoseconds?

      Thanks in advance!

  • Deshi_Intel's avatar
    Deshi_Intel
    Icon for Regular Contributor rankRegular Contributor

    HI,

    Sorry, Intel FPGA doesn't have DMA linux driver example as we are just DDR4 IP memory controller solution provider rather than system level application solution provider.

    For your enquiry on burst length of 8,

    • Yes, one read request on burst length of 8 will transfer total of 256 bit data (32 x 8)
    • But do take note this whole process happen over 4 clock cycle, each clock cycle transfer 2 times of data (rising edge + falling edge)
    • Each burst only transfer 32 bits of data where 256 bits data transfer is achieved via 8 times of data transfer using only one read command

    Thanks.

    Regards,

    dlim

    • SKGR0's avatar
      SKGR0
      Icon for New Contributor rankNew Contributor

      Previous query :

      I observed that DDR4 limits the burst length to 8 (BL8)

      Does this mean , if DQ Width is 32 , with one DDR read request I would be able to receive maximum of 256 bits (32 *8) ?

      Further on enquiry on burst length of 8

      I tried instantiating a DDR4 Controller IP for Arria 10 device and simulated the example design.

      I found that amm_burstcount = 58 in the example design.

      And this contradicts with the statement that the DDR4 IP constraints the burst length to 8 (Fixed BL8).

      Can someone please clarify on this?

      Thanks in advance!