Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
14 years ago

DMA on a DE0

Hi all,

Can someone verify for me that they have DMA to SDRAM working on a DE0 board ? I've been trying to get it to work for a couple of days, but no joy. Here's how it's set up in SOPC builder:

http://0x0000ff.com/imgs/fpga/dma.png

... and all I'm doing is instantiating the SOPC system, then creating a BSP from it with the included 'memtest.c' example. What I see when typing in values that won't conflict with the code (since the test is destructive) is that everything passes, but the DMA hangs...

http://0x0000ff.com/imgs/fpga/dma-out.png

'ramClock' is delayed c.f. 'cpuClock' (clock phase shift on 'c1' is set to -3ns). I'm not sure if it matters, but the 'compensated for' clock is set to 'c0' (which is the cpuClock output).

I am getting some timing violations on the 'altera_reserved_tck' clock since switching to the 'standard' Nios2 CPU, but the other clocks in the system seem to be within range, and the altera_reserved_tck' clock seems to be just for JTAG anyway...

http://0x0000ff.com/imgs/fpga/dma-clocks.png

Assuming the clock isn't the cause, is there anything else I ought to be doing ? Do you need any other support modules to get DMA working ?

Assuming the clock is the cause, is there a way to isolate the 'altera_reserved_tck' clock to just the JTAG circuit ?

This is all using Quartus 10.1, if it matters. That's the revision I've had most success with, to date.

Any help gratefully appreciated :)

Simon

14 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Okay okay, I give up [grin]. The modular SGDMA it is - I can take a hint, honest guv, once I've been beaten about the head with it a few times :)

    So, I tried porting the NEEK-based SGDMA - not the frame-buffer one, since I don't want to use the video-processing pipeline (the licence is too expensive for my bank account!), just the normal DMA one.

    I couldn't get it to work last night (to be fair, I didn't have too much time to try), I wrote a top-level verilog module to instantiate the system (I've never used schematic entry) and linked it up with the SDRAM controller rather than the DDR SDRAM controller. It generated the system, loaded onto the DE0, and the software program compiled but failed to verify when using 'nios2-download -g'.

    So, there's one (or more, I guess :)) of three things wrong that I can think of:

    - The code is being overwritten somehow. It's being loaded in at 0x0 (where the SDRAM starts) and perhaps something else likes that memory-location. I might try relocating SDRAM to a higher location.

    - My SDRAM clock is not synchronising correctly. I tried to understand the constraint-based sdc file included, but it seems to be tied up with the DDR controller. I don't have one of those :(. I did try changing the SDRAM clock offset to -2.5, -3, -3.5 ns without any success - and it's worked at all those settings for me before.

    - My reset logic is screwed up somehow, and the CPU is never coming out of reset.

    My first choice was to use the on-chip RAM, but I can only make 32k of RAM, and the test program doesn't seem to fit into that (I get link-time errors), so the code has to be in SDRAM. I did change the RAM_BASE and RAM_SPAN so that the tested-area of RAM starts at 1MB and extends for 6MB. It's *possible* that isn't leaving enough space for the code/stack but I doubt it.

    I'll have another look at it tonight. Maybe I just ought to go buy a NEEK...

    Thanks for all the advice so far, by the way ... It may be frustrating at times, but I am (slowly) learning stuff, and that's why I started doing this :)

    Simon.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    If you get a message about "m_state" ........ and a bunch of text that is not really all that useful then I would start suspecting that reset could be an issue. If the download just says that the validation failed then it could be any number of things including a memory problem.

    The SDR SDRAM controller doesn't make use of the first 0x20 words in the memory for calibration so you don't have to worry about that. I would recommend using a non-volatile memory for the reset vector since that's what the processor starts fetching instructions when coming out of reset.

    Your code probably wouldn't be located anywhere but the first ~80kB of the memory however the heap is probably located in the memory you are testing so I wouldn't recommend doing that. You could hack the memtest software to allocate the memory instead of having it ask you over the terminal. This way it'll be safe for the test to clobber it since it was either allocated at compile time or at run time.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Okay, it seems there was a fourth option for why it wasn't working... blind idiocy... In my defence, it was 1am when I was playing with this ... but I'd been running the 'nios2-configure-sof' command and it was picking up the 'xxx.sof' file, not the 'xxx_time_limited.sof' file...

    When I removed the xxx.sof file (with some prejudice ...) it all seems to work well - at least if I run at <= 100 MHz. I can't get the SDRAM to go higher than that, even if Quartus says fMax is 144.11MHz, even if I perturb the sdram-clock delay around a median of -3ns.

    Anyway, now I'm getting:

    --- Quote Start ---

    Test complete with a throughput of 39MB/s.

    Test complete with a throughput of 39MB/s.

    Test complete with a throughput of 38MB/s.

    Test complete with a throughput of 39MB/s.

    Test complete with a throughput of 38MB/s.

    Test complete with a throughput of 39MB/s.

    Test complete with a throughput of 39MB/s.

    Test complete with a throughput of 39MB/s.

    Test complete with a throughput of 39MB/s.

    Test complete with a throughput of 39MB/s.

    Test complete with a throughput of 39MB/s.

    Test complete with a throughput of 39MB/s.

    Test complete with a throughput of 39MB/s.

    Test complete with a throughput of 39MB/s.

    Test complete with a throughput of 39MB/s.

    Test complete with a throughput of 39MB/s.

    Test complete with a throughput of 38MB/s.

    Test complete with a throughput of 39MB/s.

    --- Quote End ---

    ... which is roughly 20% of the theoretical bandwidth of the SDRAM. I've noticed you saying that you've seen almost 100% utilization previously, so I presume there are things I can do to tweak that to make it better. Presumably the fifo interface to the real VGA controller (rather than this test) would help matters as well.

    So, now I have a working modular SGDMA system on the DE0, I can start to try getting the VGA core to interface to it :) The one in the Neek is way more general than I need - the DE0 only has 4 bits each for R,G,B so I'll be adopting a standard colour definition of 16 bits, (4 each for R,G,B, 2 for alpha, 2 for depth) which precludes the need for re-sampling, colour conversion, etc.

    Once that is working, I can start to implement the Blitter/GPU side of things, and it starts to get really interesting :)

    Thanks for all the help :) Oh, and it may not be a great implementation (the .sdc file is basic ...) but if you want the DE0 project to go alongside the Neek one, you're welcome to it :)

    Simon
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The reason why the performance is so bad is that the read and write masters are ping ponging for access to the SDRAM. If you increase the arbitration share of each master that should reduce this effect. If you were reading from the memory and writing the data somewhere else this problem would also go away. In the end your VGA output will end up being the bottleneck so as long as you have enough memory bandwidth to keep the video pipeline feed you should be fine.

    The only way to get the SDRAM operating higher than 100MHz reliably would be to constrain the interface since Quartus II has no clue what kind of off-chip timing relationships are needed (that's what the constraints are for)