Forum Discussion

Nachuan's avatar
Nachuan
Icon for New Contributor rankNew Contributor
5 days ago

Agilex 7 R-Tile CXL IP: D2H write bandwidth does not scale with dual CAFU AXI-MM ports

Device: Agilex 7 I-Series AGI027

Software: Quartus Prime Pro 24.3

IP Core: CXL Type 2 IP

Issue Description:

We are attempting to increase CXL Device-to-Host (D2H) write bandwidth by utilizing both CAFU AXI-MM ports (port 0 and port 1) provided in the CXL Type 2 IP design example. However, our measurements show that enabling both AXI ports does not improve bandwidth as expected. For Non-cacheable writes, bandwidth remains unchanged when moving from one port to two ports. For Cacheable Owned writes, bandwidth decreases when using two ports. Please refer to the figures blow for detailed results.

We are using the design example configured with two DCOH slices. To avoid potential DCOH contention, we've implemented address interleaving such that:

- AXI port 0 only accesses addresses corresponding to "even number × 64B"

- AXI port 1 only accesses addresses corresponding to "odd number × 64B"

Despite this, no bandwidth improvement is observed for either Non-cacheable or Cacheable Owned traffic.

Additionally, the non-cacheable bandwidth curve remains almost identical regardless of whether one or both AXI ports are used. This suggests that the exercised hardware path may contain a bottleneck or contention point within (either soft or hard part of) the CXL Type-2 IP.

We would like to understand how to resolve this bandwidth limitation. If it cannot be improved, we would appreciate clarification on the underlying cause of this behavior.

Thank you for your time and support.

1 Reply

  • Hi,

    Though two DCOH slices are configured, traffic from both CAFU AXI‑MM ports converges into feed into shared traffic scheduling and routing logic inside the CXL IP. This part is required by the CXL protocol and not user configurable, ultimately limits D2H write bandwidth. The two AXI-MM ports are upstream of the shared scheduler. Once the downstream scheduling resources and link are saturated, more injection sources can't increase bandwidth. DCOH slices provide pipeline parallelism and latency hiding. Since slice selection is dynamic and shared, increasing DCOH slices improves utilization efficiency, but not peak throughput.

     

    Address decoding is not the bottleneck thus address interleaving is not helpful here.

     

    For Non‑cacheable traffic, the system is link‑limited, so bandwidth remains flat. Limit is not on the AXI-MM ports or the number of DCOH slices.

     

    For Cacheable Owned traffic, enabling both AXI-MM ports increases additional contention in coherence, tag lifetime, and ordering enforcement reducing efficiency, causing bandwidth to decrease. This behavior is inherent to the CXL Type‑2 IP architecture and cannot be corrected by configuration, port usage, or address interleaving.

     

    With the current architecture using a single Type‑2 endpoint, there is no direct solution to further increase peak D2H bandwidth. Achieving higher bandwidth would require an architectural change, such as multiple CXL links or multiple Type‑2 endpoints. AXI‑MM ports may still be used for traffic separation (for example, separating cacheable and non‑cacheable traffic), but not for bandwidth scaling. The conclusion is that the bandwidth limitation can't be improved simply by AFU-side changes. 

     

    Regards,

    Rong