Altera_Forum
Honored Contributor
9 years agoPoor heterogeneous memory performance in Quartus 14.1
EDIT: This happens in Quartus 15.1 and 16.0 as well.
Hi All I have an issue with low performance when using multiple memory systems on the DE5Net Board. When using 2 Memory Systems, the non-default/non-primary memory system is getting 1/10 the expected performance of a single memory system. See below for more information. I'm using three custom Board Support Packages (BSP) on the DE5Net: 1. Uses the 2 DDRs (Very similar to the vendor provided board) 2. Uses the 4 QDRs 3. Uses the 2 DDRs, and 4 QDRs available on the Terasic DE5Net Board. The DDR Subsytem is the "Default"/Primary memory system If I compile vector add on either the custom 2 DDR board (1) OR the custom 4 QDR (2) board with a single work group, no vectorization, I get 250 M IOPS on a kernel running at 250 MHz (1 IOP/cycle). Therefore, it stands to reason that the DDR and QDR memories are being used correctly, and neither memory is incurring bandwidth limitations. However, if I compile (3), I get interesting performance results. If I tell aoc to use the DDR System for input and output vectors, I get 250 M IOPS. If I tell aoc to use the QDR system, I get 25 M IOPS. Oddly enough, if I switch the primary systems (i.e. QDR is primary), the results switch: If I tell aoc to use the DDR System for input and output vectors, I get 25 M IOPS, and the QDR System gets 250 M IOPS. As far as I can tell, all of the clocks have met timing, all clock crossing buffers are sufficiently deep, and the kernel clocks are running at ~250 MHz. So, I'm stumped. Does anyone have any ideas about what is causing this issue? Thanks