Forum Discussion

Petkov_Alex's avatar
Petkov_Alex
Icon for Occasional Contributor rankOccasional Contributor
1 month ago

S10 hps fpga2sdram bridge low speed

Hello, i have some problems 

I have a project with stratix 10 with hps

I need to use ddr4 with hps, so I enabled 3 fpga2sdram bridges with 128 bit width

Via u-boot smc configured and enabled them, but measured speed is not enough.

When I use onle one bridge my speed is approximately 32 gbit/s (bridge and master frequency 350 MHz)

But when I use all 3 bridges it becomes 20 gbit/s

I use avmm bridge to connect to axi fpga2sdram bridge with 128 bits width, max pending writes 16, max burst size 128, use only write to bridge 

ECC in emif (hmc) is disabled

I tried to use QoS for bridges, set them to bypass

P.S. If i use the same board with firmware without hps, i get 130 gbit/s (with disabled hps)

Quartus Pro 21.4

Any help would be useful!

8 Replies

  • KianHinT_altera's avatar
    KianHinT_altera
    Icon for Frequent Contributor rankFrequent Contributor

    Hi Petkov_Alex​ 
    Apologies for the slow delay in responding back to your issue. I was looking at this case and just wondering on some of your description on the issue

    • Wready is often 0, it means, that slave (axi in this case) is not ready to receive data. Awready is almost times 1.

    AWREADY is always high meaning HPS interconnect (or the cache coherency unit) is accepting your write address (queue is empty and waiting for next one)

    WREADY is often low/0, means the actual data FIFO in HPS is full, meaning it cannot push the actual data to SDRAM fast enough to cater for the incoming writes,  creating some backpressure/bottleneck in the system

    I saw there is a KDB on Stratix 10 low memory write but this is on HBM side which supposedly fixed in Quartus 22.2 

    What could be the reason for the low write performance Intel® Stratix® 10 MX/NX FPGA High Bandwidth Memory (HBM2) IP write response path in AXI backpressure mode? | Altera Community - 347633

    I will try see any other documentation to see anything related to this case.

    • I use constant different address for each bridge, and burst 128 write with 128 bit data, for example 0x20100000, 0x20200000, 0x20300000

    So if I understand correctly, it translates into 128 writes * 16bytes = 2048 bytes  

    Initially i thought might be the 4KB boundary limit but your transaction is within there so no issues as long as its within the boundary.

    0x20100000 (bridge 1) , 0x20200000 (bridge 2), 0x20300000 (bridge 3) = gap 1048 bytes (1MB) . Could it be accessing the same bank but different row in parallel thus causing the DDR row thrashing? Just assumption here , possible for you to test spacing the test addresses by 256 MB (e.g., 0x10000000, 0x20000000, 0x30000000) to ensure they naturally hit different banks or bank groups

    Thanks

    Regards

    Kian

     

    • Petkov_Alex's avatar
      Petkov_Alex
      Icon for Occasional Contributor rankOccasional Contributor

      Thanks for your answer, I will change spacing and give you result.

  • Petkov_Alex's avatar
    Petkov_Alex
    Icon for Occasional Contributor rankOccasional Contributor

    Maybe some ideas or I already got max available throughput speed? If yes, I think it's better to change speed characteristics in documentation or describe limitations.

  • Hi Alex,

    When you use all 3 bridges, it becomes 20 gbit/s for each interface or 20 gbit/s for all the 3 interfaces? And does each axi fpga2sdram bridge be connected to its own avmm interface? And how is the address pattern on each interface? Sequential write or random write?

    • Petkov_Alex's avatar
      Petkov_Alex
      Icon for Occasional Contributor rankOccasional Contributor

      As described in doc (AN 802, page 38) each bridge with 128 bit width and 400 mhz frequency have 6,4 GB/s(51,2 Gb/s)

      And of course, it's ideal, so i expect 80% of it

    • Petkov_Alex's avatar
      Petkov_Alex
      Icon for Occasional Contributor rankOccasional Contributor

      Thanks for your questions.

      When I use 3 bridge whole throughput becomes 20 gbit/s, its sum of 3 bridges.

      And each bridge have own avmm interface.

      I use constant different address for each bridge, and burst 128 write with 128 bit data, for examle 0x20100000, 0x20200000, 0x20300000

      So it has sequential writes

      • Qingrui_H_Intel's avatar
        Qingrui_H_Intel
        Icon for New Contributor rankNew Contributor

        Please try to change the "Address Ordering" in EMIF ip to see the other 2 settings can help improve this. And you can also monitor the transactions on AXI F2SDRAM interface to see the real write commands in these interface to ensure if the burst commands in avmm is split or not.