Hi John,
Thanks for the reply. I totally missed that the Agilex 7 benchmarks were on the PCIe based design. I also noticed in the AI Suite handbook the M2M example command logs that they share has similar delta. For resnet 50, on Agilex 7 FPGA I Series Transceiver-SoC Development Kit it shows an system throughput of 27fps vs IP throughput of 123. So you are right dla_benchmark is not the suitable way to measure total system throughput.
I would inspect and try the 4K AI camera example you shared, and get back to you on my strategy to benchmark the performance of AI Suite.
Thanks