Altera_Forum
Honored Contributor
17 years agomaximizing read bandwidth of HP DDR2 memory controller
Hi,
I'm trying to maximize the read bandwidth thru the High Performance (HP) DDR2 memory controller using the Stratix III devkit & the included Micron DIMM. I am getting less than I had hoped for (~50% when doing 8x256b read bursts, with waitrequest backpressure slowing things down). I'll keep trying tricks to coax a few more % of perf out (at expense of latency, additional buffering, longer bursts, etc), but I'd really appreciate it if anyone has more insight into what works best with this controller, and/or if there's a way to use the controller differently, or even redesign a module or two of it. (I'm hoping to not have to design my own or buy another for this relatively simple need.) Here's the specific scenario to help bound this potentially huge question: - I'm only doing reads (when I care about the perf), no writes. I'd happily sacrifice write perf if necessary. - Only 1 avalon master (a dma arbiter) is requesting all the reads, and is already 'directly' connected to the memory controller (well, no burst nor width adapters and no bridges nor clock crossings, just the sopc arbitration logic w/ idle/non-requesting masters). - I'm using half-rate controller (for max mem clk freq), which means there's a 256b-wide data bus on the avalon side (& the controller doesn't use bursts so the master doesn't use "avalon burst", although it of course can & will generate sequential addressed requests). - I have multiple (large) read streams to fetch from the memory (>5), so I think it should be possible in DDR2 to fetch large enough bursts & arrange the address bits a bit so that bank interleaving can hide the ras/cas overheads. - The consumers of these streams of data are relatively slow, so I'm prefetching & buffering read data in order to get better read bandwidth. - Based on the docs I've seen, I think the controller only has a 4-deep input request buffer (non bursts!), so I don't think it's possible for it to have enough info for hiding the overheads & so I'm guessing it's just taking each request, activating a new page when needed, and then trying the next request... following all the ddr2 timing rules of course. It seems like this would make it hard to do much of anything fancy in terms of bank interleaving & etc with this. At best, it can probably be opportunistic about leaving banks open & hoping a future request might hit an open bank again later (but can cause other probs so it might be odd to do this by default). - So, I'm assuming that I can increase BW by increasing the sequential address (~burst) length, but that's getting more expensive (buffers), less responsive to changes in address (wasted prefetch), and longer latencies when 2 or more new addresses show up for new streams. I might also be able to do some multi-bank request interleaving if pages are indeed left open. - But, what I'd really like is to get near the max datarate... If 4 bank interleaving was used to hide ras/cas overheads, couldn't streams (+ some smart address bits -> bank/row/col addr bit mappint) be used to get close to max theoretical bandwidth? (-refresh effects) with a reasonable sized burst? (eg: 8 x256b or so, depending on mem_clk frequency vs memory timing parameters). It might not be most power efficient since pages would be getting opened & closed more than necessary, but this seems straightforward... (but not with this avalon interface?) Like I said, I'm no expert, so hopefully I'm just missing something. Any tips, insight, achieved perf# 's, & etc would be really appreciated. I can use any tips on how to get the last bits of perf from the HP DDR2 memory controller, but I'm hoping someone will tell me that I can modify an existing DDR2 controller to give me the lower level sequence control. Thanks, -Brian PS: using Quartus II 8.0 sp1, sopc builder, stratix III speed grade 3, ... PPS: oh, an explanation of what the HP DDR2 controller does or doesn't do (eg: does it keep page open if intermediate request is to another bank?) would also help keep me from trial/error type of reverse engineering.