Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
14 years ago

PCIe bus master random access to all host memory

I'm not sure if the PCI Express MegaCore hard IP can do what I want. I'll describe what I'm trying to achieve.

I'd like to set up a bus-master scatter-gather DMA between on-device memory and a host memory buffer. The host memory buffer is contiguous in virtual address space, but when locked down into 4K-sized physical memory pages, the physical addresses of those pages are randomly scattered all over physical memory - a classic usage case of scatter-gather DMA.

I'm not sure how this fits in with the Avalon-MM-to-PCI Express address translation table. Would I have to set up a translation table entry for each 4K page or could I set up a few translation table entries to cover the whole host memory physical address space (or at least the bottom 3 GB ? I'm more familiar with "PCI to local bus" bridges (e.g. the ones made by PLX Technologies) that have completely separate address spaces on each side of the bridge.

The only example I've got to go on is the WinDriver code for the "PCI Express in Qsys Example" design, but that's nothing like what I want as it allocates a contiguous area of physical memory in the bottom 16 MiB of physical memory.

14 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    You need to use one of the dma controllers to do the burst writes into the Avalon slave interface of the PCIe block.

    Before requesting the transfer you'll need to set the address translation tables so that the correct physical address bits are used.

    Last time I looked none of the DMA controllers supported 64bit addressing on one port, so it isn't possible to avoid the address translation tables in the PCIe block.

    I also remember having difficulty configuring 32bit address transparancy.

    I can't imagine that you'd want to link the PCIe Avalon slave to a normal master (like a nios cpu) - since you really don't want to stall while the transfer tales place. Better would be a 'single transfer (degenerate) DMA controller' to which you write the physical address and data and then poll for completion.

    Another useful item would be a memory block that is dual ported as an Avalon slave and to 'PCIe dma logic'. You could then arrange for the data to be in this special memory block and directly request a PCIe transfer to/from it. This would save resources and reduce latency.

    Unfortunately Altera don't seem to be making this easy to use.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    You need to use one of the dma controllers to do the burst writes into the Avalon slave interface of the PCIe block.

    Before requesting the transfer you'll need to set the address translation tables so that the correct physical address bits are used.

    Last time I looked none of the DMA controllers supported 64bit addressing on one port, so it isn't possible to avoid the address translation tables in the PCIe block.

    I also remember having difficulty configuring 32bit address transparancy.

    I can't imagine that you'd want to link the PCIe Avalon slave to a normal master (like a nios cpu) - since you really don't want to stall while the transfer tales place. Better would be a 'single transfer (degenerate) DMA controller' to which you write the physical address and data and then poll for completion.

    Another useful item would be a memory block that is dual ported as an Avalon slave and to 'PCIe dma logic'. You could then arrange for the data to be in this special memory block and directly request a PCIe transfer to/from it. This would save resources and reduce latency.

    Unfortunately Altera don't seem to be making this easy to use.

    --- Quote End ---

    Hi dsl

    Thanks for you reply!

    Yes, altera doesn't seem to be easy to use. When my data is ready inside dual-port ram, how can i iniate(request or start) a PCIe transfer? Is there any status or flag signal for application software to poll for this data transfer?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I'm now trying to get this to work, and of course it doesn't.

    I don't need high throughput but do need long TLPs (read and write) and asynchronous operation (controlled by a nios cpu).

    The 'simple' DMA controller ought to work, but when I request a transfer all that happens is the 'BUSY' bit in the status register is set.

    Unfortunately it is a bit difficult to connect the JTAG port to the board I'm using, making signaltap unusable.

    It might just be that I've configured the DMA controller incorrectly (in sopc).

    I haven't found any info into the required parameters for PCIe DMA.

    I decided that the 'best bet' was to 'enable burst transfers' and I set the 'maximum burst size' to 128.

    There is some strange comment in the documentation that the maximum transfers length

    must be less than the maximum burst length - is this really true?

    The longest transfer I need to do is 0x120 bytes, most will be shorter.

    Possibly I should be enabling (and using) 64bit transfers?

    Any ideas what I've done wrong?

    The SGDMA is overcomplex for what I need.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Hi all,

    - I am already having a design that communicates with x86 processor from NIOS II using the shared memory over the PCIe interface.

    - With new design my aim would be to reverse the shared memory and place it on the x86 processor DDR memory instead of FPGA SSRAM.

    - And aware of this would require some complex address translation logic to be included in the fabric.

    - I came to know that, "txs" signal in the PCIe interface ip core will access the host memory, but I want to know how that signal will access DDR memory or some other internal memory on x86 processor.

    - Also I like to know how DDR memory in x86 processor is used?

    - I am interested to know if someone has already achieved something similar and if it is possible to get hold of a reference design for this or related configuration to start with.