Forum Discussion

Suresh430's avatar
Suresh430
Icon for New Contributor rankNew Contributor
1 month ago

Agilex 5 Sulfur Partial Write Issue on F2H ACE‑Lite I/F (256‑bit) with AXI Master of 128‑bit

Hello Intel Support Team,

I am working on an Agilex 5 Sulfur Development Board and implementing an HPS‑based design where a USB Host module (custom logic) acts as an AXI Master and performs memory accesses to SDRAM through the F2H ACE‑Lite interface.

I am seeing an issue related to partial writes when the AXI data width is translated from 128‑bit to 256‑bit before reaching the F2H bridge.

a { text-decoration: none; color: #464feb; } tr th, tr td { border: 1px solid #e6e6e6; } tr th { background-color: #f5f5f5; }

Design Summary

AXI Master (USB host logic)

  • Address width: 32 bits
  • Write data width: 128 bits
  • Write strobe (WSTRB): 16 bits

Interconnect Path (Platform Designer)

The Master AXI interface passes through the following autogenerated components:

  1. mm_interconnect_0_ace5lite_cache_coherency_translator
  2. ace5lite_cache_coherency_translator

ACE‑Lite Interface to F2H

  • Address width: 32 bits
  • Write data width: 256 bits
  • Write strobe: 32 bits

 

Observed Behavior (via SignalTap)

  1. The 128‑bit write data is properly expanded to 256‑bit by the translator.
  2. The 16‑bit WSTRB is correctly translated to 32‑bit, with only the lower or upper half asserted as expected.
  3. The AXI address falls correctly within the SDRAM region.
  4. The writes propagate through CCU → MPFE correctly (based on external visibility).

Problem

When reading back SDRAM from software running on the HPS, we observe that:

👉 The entire 256‑bit word in SDRAM is modified,
even though
👉 only 128 bits of WSTRB were asserted on the ACE‑Lite interface.

SignalTap shows the correct WSTRB on the F2H side, but SDRAM readback indicates that the "inactive" 128‑bit lanes are also being overwritten.

Because the HPS subsystem is a hard macro, we cannot probe signals inside the CCU / MPFE / F2H bridge to see what is actually happening after the ACE‑Lite boundary.

a { text-decoration: none; color: #464feb; } tr th, tr td { border: 1px solid #e6e6e6; } tr th { background-color: #f5f5f5; }

Questions for Intel

We would greatly appreciate guidance on the following:

1. ACE‑Lite 256‑bit Partial Write Support

Are there any documented limitations or required settings for partial‑word writes on the 256‑bit F2H ACE‑Lite interface in Agilex 5?

2. MPFE / SDRAM Controller Behavior

Does the MPFE / CCU / SDRAM controller internally convert all writes to full‑width beats, regardless of WSTRB?
If so, is there a way to ensure correct byte‑enable behavior?

3. Required Qsys Settings?

Are there specific configuration requirements for:

  • The ACE‑Lite translator
  • Interconnect pipeline stages
  • Burst alignment
  • Address alignment for partial writes
  • Write‑data interleaving settings

4. Debugging Recommendations

Since internal HPS signals cannot be probed, is there:

  • Any documented method to trace ACE‑Lite transactions inside HPS?
  • Any diagnostic registers or trace capabilities in CCU/MPFE?
  • Any recommended debug flow for this type of issue?

3 Replies

  • KianHinT_altera's avatar
    KianHinT_altera
    Icon for Frequent Contributor rankFrequent Contributor

    Hi Suresh,

    Apologies for the delay in getting back. As for your questions so far this is what I found from the documentations 

     

     

    While the document states it supports partial writes , seems in the screenshot it has no guarantee it will respond on narrow data aligned with non 256bit. Main suspect might be the HPS Cache Coherency Unit/CCU is converting the partial writes into full width via read,modify,write (RMW) operation

    So basically Agilex 5 HPS and CCU operate on 512-bit cache lines. From my understanding your USB logic is providing 128bit partial data via WSTRB to 256 Ace lite interface that uses 256 bits and in turn to the system that operates at 512bit cache line.

    As ACE Lite interface is in coherent mode, CCU cannot pass a 128bit partial write directly to SDRAM since it is on 512 bit width. Thus this is where the RMW comes in where it reads 512 bit from cache or SDRAM(previous or some random data), then modify it and then applies your 128bits data(together with another 128bits garbage/random data) into that 512bit data and then write the entire 512bit to the MPFE and SDRAM controller.

    If using full data 256bit in your custom data with 2 burst (AWLEN=1) to fulfill the full 512bit, does the issue happen, if full 512bit most probably CCU will not do any RMW here? If single burst probably the other half will still be some random 256bit data that CCU packed into 512 bit

     

    Thanks

    Regards

    Kian

    • Suresh430's avatar
      Suresh430
      Icon for New Contributor rankNew Contributor

      Hi Kian,

                    Thanks for your insights. Is there any width adaptor IP readily available with altera to support 128-bit Merging into 256-bit for write channel and 256-bit Splitting into 128-bit for read channel.


      Regards
      Suresh

      • KianHinT_altera's avatar
        KianHinT_altera
        Icon for Frequent Contributor rankFrequent Contributor

        Hi Suresh,

        Sorry for the delay, was out of office. Previously I took a look on this , unfortunately there is no ready IP that can directly address this issue , documentation only mentioned about user needs to add a width adaptation interconnect logic but I didnt see any like examples from this.

        Initially thought maybe FIFO IP but on its own it does not have Write strobes (WSTRB) in AXI protocol for partial write so cannot be used directly. In other hand, i think you are already using the ACE5-Lite cache coherency translator or width adapter performing implicit RWR operations, converting partial 128-bit writes into full 256-bit cache line operations

        Have you considered instead of F2H, using F2SDRAM bridge instead to write the USB directly to the SDRAM bypassing the cache coherency unit and its width requirements?

        Otherwise, I think it is possible to use FIFO IP to handle the data packing and then using the ACE5 lite cache coherency translator IP to handle the transaction. You might need a custom state machine to manage the FIFO IP to wait for it to pack all the data 512bit? to avoid the RWR and trigger the translator when data 2x256bit is fulfilled.

        Thanks

        Regards

        Kian