Forum Discussion

MichaelB's avatar
MichaelB
Icon for Occasional Contributor rankOccasional Contributor
4 years ago

Avalon to AXI implementation

Hi,

currently I'm thinking to implement an own Avalon <-> AXI4 (MM) adapter and not using the QSYS autogenerated adapter.

Currently we are using an AXI4 DMA which will stream data into the DDR4.

Because the DDR4 is using an Avalon interface I already tried the autogenerated converter which is too slow to support the data rates.

Even with pending transactions set to 64 & burst size 16 we are not able to achieve data rate > 3 Gbps (AXI has burst size 16, too).

DDR4 (1600 MHz) Avalon running with 200 MHz @ 512b and DMA with 160 MHz @ 256b.

Due the DMA on the 160 MHz got overflows I can be sure the transaction is the issue (I did not expect this because I have a higher frequency and double data width on receiving side..)

We had the same issues for AXI DMA <-> AXI HBM2 as well.

Here we implemented our own AXI <-> AXI connection which was much better and supports our required data rates much better than the autogenerated AXI converter (verified in simulation & on FPGA).

Regarding this we already had several debug sessions with Premier support - final result was to NOT use the autogenerated adapter and use our own...

Again in DDR4 we are facing the same throughput limitations (AXI <-> Avalon is the limitation).

Could you give me advices to implement this conversion?

Are there any data sheets which already describe the adapter autogenerated by QSYS?

Furthermore I saw I can edit the maximum pending read transactions on the DDR4 EMIF core & on Avalon Clock Crossing Bridges but not the maximum pending write transactions. Is there a reason why I cannot edit these parameters in those IP cores?

Kind regards,

Michael

14 Replies

  • Hi @MichaelB

    Sorry for the delay in response.

    You may checkout the User Guide below for more information on the adapter autogenerated by Platform Designer.

    https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug-qpp-platform-designer.pdf

    Could you share a screenshot of the DDR4 EMIF core & Avalon Clock Crossing Bridges that shown you can edit the maximum pending read transactions but not the maximum pending write transactions?

    Best Regards,
    Richard Tan

    p/s: If any answer from the community or Intel support are helpful, please feel free to give Kudos.

    • MichaelB's avatar
      MichaelB
      Icon for Occasional Contributor rankOccasional Contributor

      Hi Richard,

      thanks for your reply!

      In the EMIF DDR4 & Avalon CCB settings I'd like to increase pending writes (CCB Avalon_M will write data to Avalon_S of DDR4):

      AXI_M (write) -> Avalon CCB -> DDR4

      AXI Burst length = 16 (data width = 256) & Avalon Burst length = 8 (data width = 512) based on my calculation:

      16*256 = 8*512

      Will the interconnect resolve data width conversion and align the bursts?

      Screenshots of CCB & EMIF:

      EMIF DDR4 parametersEMIF DDR4 AVM settingsAvalon CCB AVM settingsAvalon CCB parameters

      Let me know if you need further information!

      Best regards,

      Michael

  • Hi @MichaelB

    From what I found, the DDR4 EMIF core & Avalon MM Clock Crossing Bridges does not seem to support maximum pending write transactions. The interface must have both response and writeresponsevalid signals which these IP does not have. You may create a custom component though by adding the respective signals. fyi, the maximum pending read transactions need readdatavalid signal.

    Will the interconnect resolve data width conversion and align the bursts?

    Make sure there is no error in the system message and the platform designer should take care most of the interconnect between the interface.

    You may check the chapter 5.1. Memory-Mapped Interfaces for further details:

    https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug-qpp-platform-designer.pdf#page=208

    If you have further question on EMIF, I would recommend you to open a new forum case on EMIF related questions. As I am not an expert in EMIF unfortunately.

    Best Regards,
    Richard Tan

    p/s: If any answer from the community or Intel support are helpful, please feel free to give Kudos.

    • MichaelB's avatar
      MichaelB
      Icon for Occasional Contributor rankOccasional Contributor

      Hi Richard,

      yes, I recognised this, too, that here are some signals missing (EMIF core & Avalon CCB).

      • Is there an option in the Avalon CCB & EMIF core to enable those?
      • How can I create a custom component for an standard IP? Would this be a custom component instantiating the EMIF core?

      I tried to edit the interface of those IP cores in the component section in QSYS but I cannot add further signals.

      I already read through the EMIF user guide (https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/stratix-10/ug-s10-emi.pdf) but I could not found any setting to enable/disable pending write transactions.

      Currently I am using the autogenerated interconnect from QSYS to resolve 256b AXI (BL = 16) to 512b Avalon (BL = 8).

      Here I did not see any errors in the QSYS only the hint that an Avalon adapter will be inserted between AXI <-> Avalon.

      Are there any special settings of the mm_interconnect I have to configure to do the bus conversion?

      From the documentation of the Platform Designer I assumed this will be done by mm_interconnect automatically.

      Do you have any reference design (QSYS) where a AXI <-> Avalon connection with bitwidth conversion + burst conversion is done?

      That would be helpful to understand the settings on both sides to align them for the best throughput performance.

      Kind regards,

      Michael

  • MichaelB's avatar
    MichaelB
    Icon for Occasional Contributor rankOccasional Contributor

    Hi Richard,

    yes, I won't use the outstanding transactions due it is not supported by EMIF core anyway.

    Furthermore I think it is not very beneficial to edit a standard component with an own component due the outstanding is not supported anyway by the EMIF core itself.

    Would you recommend to use a CCB or an autogenerated CCB between two Pipeline Bridges? (mm_interconnect)

    Here I would then configure without any outstanding transactions and just defining the BURST size.

    Would this be a valid design for high throughput from AXI master to DDR?

    Again this is my main reason why I opened this thread.

    With > 3 Gbps I would get overflows on the AXI master side - here I'm running with 160 MHz @ 256b and don't know why I have overflows.
    DDR is working with 200 MHz @ 512b and it doesn't make sense for me why I get overflows then - We faced such issues previously and we checked in simulation that Avalon <-> AXI (mm_interconnect) does not response fast enough with a valid indication.

    Furthermore we figured out to do protocol conversion and CDC (AXI 160 MHz @ 256b <-> Avalon 200 MHz @ 512b) is even worse in throughput than doing the protocol conversion first and then the CDC from Avalon <-> Avalon only.

    Here I really want to be sure to support a data rate > 10 Gbps which should be possible with a BURST size of 32 in a 160 MHz @ 256b domain.

    Would you recommend to just set max. read/write outstanding to 0 and just do the connection with Avalon <-> AXI BURST/bitwidth conversion?

    Kind regards,

    Michael

  • Hi Michael,

    I am building a system with PCIe endpoint <-> AXI <-> HBM2.

    I had a query regarding whether AXI interface able to support larger burstcount ? I have enabled burstcount greater than 32 in HBM controller.

    Still, when I increase the burstcount in software to greater than 2, the HBM controller doesnt respond with data.

    Did you face such issues interfacing AXI with HBM

    Best Regards,

    Pramod

    • MichaelB's avatar
      MichaelB
      Icon for Occasional Contributor rankOccasional Contributor

      Hi Pramod,

      we did a similar architecture where we had multiple masters connected to a single HBM2 channel + PCIe DMA access, too.
      Here the HBM2 burst controller seems to have a bug in some specific Quartus versions (Q20.4 and lower).

      We succeeded with a solution provided by Intel to patch the IP files after IP generation (yes, you won't be able to use the common tool flow anymore sadly). With Quartus 21.1 you will have this fix provided in the IP again.


      https://www.intel.com/content/www/us/en/support/programmable/articles/000086781.html

      For us it was a long way to debug this. Hopefully this will help you.

      FYI:
      We currently didn't switch to 21.1 because with 20.4 we got no timing violations in our design and with 21.1 the retiming process is not working properly again. With this we got tremendous timing violations and there we did not get a clear information from Intel support team why this is happening with a version upgrade.
      Let's see if this is fixed in further versions...

      Kind regards,

      Michael

      • Pramod_atintel's avatar
        Pramod_atintel
        Icon for New Contributor rankNew Contributor

        Hi Michael,

        Thanks for the reply.

        I am using Quartus 21.3 version. That should have solved the burst count issue, but still I am seeing the same issue.

        I will try replacing the auto-generated file from the link you sent and check.

        In 21.3, I am getting some timing violations, but most of them are false paths (signaltap related).

        Did you use AVMM or AVST for PCIe interface ?

        Best Regards,

        Pramod

  • Hi Michael,

    In the DMA transfer from PCIe <-> HBM, i am seeing very high data rate for receive port (Rx, from FPGA to host) and very less bandwidth for transmitter port (Tx, from host PC to FPGA).

    Did you face such issues ?

    Regards,

    Pramod