Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
14 years ago

Question about PCIe speed

I have been working on the chaining DMA example project for PCIe provided by Altera.

I am a bit confused about what performance I should expect though.

When sending 16.384 GBytes of data from a C code (or reading the same amount of data), the program runs for about 17 seconds, which gives a bit rate of 7.71Gbps.

I am using Gen2 64-bit x4 lanes.

Gen2 is quoted at 5Gbps per lane, but because of the 8/10 encoding it's actually about 4Gbps. Since I am using 4 lanes I should expect a speed of 16Gbps, is this correct?

This is twice what I have...

I am on linux and I have based the driver on the altpciechdma driver by Woestenberg and Heppermann. At some point it says that it's using 32-bit DMA addressing instead of 64, could this be the reason I am two times slower than I should?

When looking at performance results from an456.pdf there seems to be no difference betweem Gen2 x4 64-bit and Gen2 x4 128-bit (I'm not sure what those bits are though, is it the same as the DMA mask?).

Finally, when they say 5.0Gbps, is it one way or two-way? i.e., should sending 100MB and receiving 100MB simultaneously take as long as sending OR receiving 200MB?

Thank you!

6 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    --- Quote Start ---

    Since I am using 4 lanes I should expect a speed of 16Gbps, is this correct?

    This is twice what I have...

    --- Quote End ---

    PCIe DMA performance is restricted by both, PCIe and PC memory throughput. You seem to assume, that PC memory speed won't play a role, very unrealistic in my opinion.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    What do you mean by PC memory speed?

    I looked at the amount of RAM used during the DMA's, there's plenty of free memory, so it doesn't seem like a bottleneck.

    I'm sending the data to the driver from a C code using fwrite and fread functions (to read/write from/to the driver's file), I don't think they would slow the system down.

    I'm not sure where the problem could be.

    Also, I'm still not sure whether the 5.0Gbps is one or two-way. If it's two-way, then the problem might just be that I'm doing a DMA write followed by a read, followed by a write etc..., instead of launching both a DMA read and a DMA write at the same time (if that's possible).
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    5.0Gbps is two-way speed,do you perform the read and write between mcu and fpga?

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I can't speak exactly to your scenario, but my situation is as follows. System setup:

    Motherboard: Asus AT5IONT-I

    OS: Windows 7

    FPGA: EP4CGX15BF14C7 (gen 1.0 x1 lane)

    Design: PCI Express to External Memory Reference Design

    My transfer speeds were (16kB transfer sizes):

    Theoretical limit: 250MB/s

    Actual (FPGA->computer): 198MB/s

    Actual (computer->FPGA): 120MB/s

    I hope that provides a more concrete basis for comparison.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thank you for the responses!

    I understand the whole thing a bit better now.

    By tweaking some parameters, I actually get a speed close to what you reported (multiplied by 8, since I am using 4 lanes and Gen2).

    There still is one problem though:

    I can get this speed when doing either a DMA read, or a DMA write. But I don't understand how to do both at the same time.

    I am using the chaining DMA example, and thus need to fill the descriptor table.

    I fill up the descriptors (endpoint address, root complex address, length of the data), then write the number of descriptors into the write header to launch a write, or the read header to launch a read.

    From what I understand, both the DMA write and the DMA read modules share the same descriptor table, so how can I trigger them at the same time? how do they know which descriptor is for which module? or is a local copy of the descriptor table generated after launching one of them, so that I can overwrite previous descriptors even before the operation is over?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    OK I think I finally got it.

    As the documentation clearly says, there are two descriptor tables, not one. I got confused because the driver I was basing my work on was creating only one descriptor table in RC memory, and using it for both DMA reads and DMA writes, I didn't realize I could just instantiate a second one.