Agilex 7 R-Tile CXL IP: D2H write bandwidth does not scale with dual CAFU AXI-MM ports

Question

Device: Agilex 7 I-Series AGI027Software: Quartus Prime Pro 24.3IP Core: CXL Type 2 IPIssue Description:We are attempting to increase CXL Device-to-Host (D2H) write bandwidth by utilizing both CAFU AXI-MM ports (port 0 and port 1) provided in the CXL Type 2 IP design example. However, our measurements show that enabling both AXI ports does not improve bandwidth as expected. For Non-cacheable writes, bandwidth remains unchanged when moving from one port to two ports. For Cacheable Owned writes, bandwidth decreases when using two ports. Please refer to the figures blow for detailed results.We are using the design example configured with two DCOH slices. To avoid potential DCOH contention, we've implemented address interleaving such that:- AXI port 0 only accesses addresses corresponding to "even number × 64B"- AXI port 1 only accesses addresses corresponding to "odd number × 64B"Despite this, no bandwidth improvement is observed for either Non-cacheable or Cacheable Owned traffic.Additionally, the non-cacheable bandwidth curve remains almost identical regardless of whether one or both AXI ports are used. This suggests that the exercised hardware path may contain a bottleneck or contention point within (either soft or hard part of) the CXL Type-2 IP.We would like to understand how to resolve this bandwidth limitation. If it cannot be improved, we would appreciate clarification on the underlying cause of this behavior.Thank you for your time and support.

rongy_altera · Accepted Answer

Hi,
Though two DCOH slices are configured, traffic from both CAFU AXI‑MM ports converges into feed into shared traffic scheduling and routing logic inside the CXL IP. This part is required by the CXL protocol and not user configurable, ultimately limits D2H write bandwidth. The two AXI-MM ports are upstream of the shared scheduler. Once the downstream scheduling resources and link are saturated, more injection sources can't increase bandwidth. DCOH slices provide pipeline parallelism and latency hiding. Since slice selection is dynamic and shared, increasing DCOH slices improves utilization efficiency, but not peak throughput.
&nbsp;
Address decoding is not the bottleneck thus address interleaving is not helpful here.
&nbsp;
For Non‑cacheable traffic, the system is link‑limited, so bandwidth remains flat. Limit is not on the AXI-MM ports or the number of DCOH slices.
&nbsp;
For Cacheable Owned traffic, enabling both AXI-MM ports increases additional contention in coherence, tag lifetime, and ordering enforcement reducing efficiency, causing bandwidth to decrease. This behavior is inherent to the CXL Type‑2 IP architecture and cannot be corrected by configuration, port usage, or address interleaving.
&nbsp;
With the current architecture using a single Type‑2 endpoint, there is no direct solution to further increase peak D2H bandwidth. Achieving higher bandwidth would require an architectural change, such as multiple CXL links or multiple Type‑2 endpoints. AXI‑MM ports may still be used for traffic separation (for example, separating cacheable and non‑cacheable traffic), but not for bandwidth scaling. The conclusion is that the bandwidth limitation can't be improved simply by AFU-side changes.&nbsp;
&nbsp;
Regards,
Rong

Forum Discussion

Agilex 7 R-Tile CXL IP: D2H write bandwidth does not scale with dual CAFU AXI-MM ports

1 Reply

Recent Discussions

SysID Timestamp

Cyclone 10 GX Transceiver Power-Up Calibration Time (~353 ms) Analysis Request

AVST FIFO and AVST Demultiplexer IP Simulation Behavior

User controlled burst refresh

F-tile ethernet hard ip in agilex7