Forum Discussion

Altera_Forum's avatar
Altera_Forum
Icon for Honored Contributor rankHonored Contributor
16 years ago

PCI Express Tx interface gets stalled

Hi,

I'm still trying to send data (A/D samples) from my PCI Express endpoint (S4 GX Dev. Kit. -> Hard IP) to the root port memory (RAM).

I'm able to receive data from the root port, and also return a completion, but when i try to send a memory write transaction it seems no data is sent at all.

The first packet with a payload of 512 DWORDs can be transfered to the MegaFunction without problem (but it is not sent), but on the second packet (after ~10 clocks) the ready signal of the avalon-st tx interface goes low, and fifo full goes high (at the same time).

There is plenty of time between the two packets so i assume that timing is not an issue. Also the headers and data created/sent seem to be correct. Btw I don't use any of Altera's reference designs, but did the implementation from scratch.

Unfortunately I can't tap into the Hard IP so I have nooooo idea what's going on there.

Does anybody have some suggestions what might be going wrong here, or provide some more detailed information about the MegaFunctions internal operation, pleeeease???

20 Replies

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    The key is that the waveform in figure 5-14 applies to 3 Dword header TLPs with NON-Qword aligned addresses. There is a comment about this in the text immediately above the waveform. This comment implies the address restriction I mentioned in the previous post. I hope this helps.

  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I see, but wouldn't this mean that I can't write to the first DWORD of a memory page for example?

    Since memory write requests are not allowed to cross a page boundary I can't just set the address to the last DWORD of the page before, and insert an invalid DWORD (by setting the byte enables for the first DW to zero).

    Have you/Has someone done something similar (writing from the endpoint to the root port memory) using the hard IP in the StratixIV GX Dev. Kit?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    No, I wouldn't expect that to be the case. Assuming a 4KB page, the 12 LSBs of a page address are guaranteed to be '0', which also guarantees that the same address is QWORD aligned.

    I have successfully done exactly what you are trying to do with both the PCIe high performance reference design and a home brew design, both running on the Stratix IV dev kit. I have also successfully run the Altera design under Windows using the reference design drivers and code and under Linux using a custom driver and application. I have only run the home brew design under Linux.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    But you said that when I use a 3 DW header the addr. is/has to be NON QWORD aligned (the 3 LSBs are 100b) which means I'm starting at the second DWORD?

    Maybe there is something wrong with my entire concept?

    I don't use a DMA core like in the reference design... I'm using WinDriver to create a continuous buffer in memory, and send the physical address of the buffer to endpoint. Then I set a enable flag in my endpoint from within the driver, and from that point the endpoint keeps sending the sampled data to the root port memory (the mentioned buffer) until it is disabled. From my understanding of PCI Express this should work, but maybe I'm wrong?

    I've also been digging for some software that can be used to debug the PCI Express bus, and found this utility that can be used to analyze the PCI Express configuration space (btw. its for free)

    http://www.lecroy.com/protocolanalyzer/protocoloverview.aspx?seriesid=193&capid=103&mid=511

    Maybe this will help me figuring out what is going wrong.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Sorry for the confusion on the address point. I will try to clarify what the documentation is saying. Note that what I will describe below assumes the following:

    * 64-bit AV-ST interface on the backend of the PCIe hard ip core

    * packet payload length of 4 dwords

    When your starting address is a 32-bit non-QWORD aligned address, your packet will look like:

    DW0: packet header dword0, SOP=1, EOP=0

    DW1: packet header dword1, SOP=1, EOP=0

    DW2: packet header dword2, SOP=0, EOP=0

    DW3: payload dword0, SOP=0, EOP=0

    DW4: payload dword1, SOP=0, EOP=0

    DW5: payload dword2, SOP=0, EOP=0

    DW6: payload dword3, SOP=0, EOP=1

    DW7: don't care, SOP=0, EOP=1

    When your starting address is a 32-bit QWORD aligned address, your packet will look like:

    DW0: packet header dword0, SOP=1, EOP=0

    DW1: packet header dword1, SOP=1, EOP=0

    DW2: packet header dword2, SOP=0, EOP=0

    DW3, don't care, SOP=0, EOP=0

    DW4: payload dword0, SOP=0, EOP=0

    DW5: payload dword1, SOP=0, EOP=0

    DW6: payload dword2, SOP=0, EOP=1

    DW7: payload dword3, SOP=0, EOP=1

    When your starting address is a 64-bit non-QWORD aligned address, your packet will look like:

    DW0: packet header dword0, SOP=1, EOP=0

    DW1: packet header dword1, SOP=1, EOP=0

    DW2: packet header dword2, SOP=0, EOP=0

    DW3: packet header dword3, SOP=0, EOP=0

    DW4: don't care, SOP=0, EOP=0

    DW5: payload dword0, SOP=0, EOP=0

    DW6: payload dword1, SOP=0, EOP=0

    DW6: payload dword2, SOP=0, EOP=0

    DW7: payload dword3, SOP=0, EOP=1

    DW8: don't care, SOP=0, EOP=1

    When your starting address is a 32-bit QWORD aligned address, your packet will look like:

    DW0: packet header dword0, SOP=1, EOP=0

    DW1: packet header dword1, SOP=1, EOP=0

    DW2: packet header dword2, SOP=0, EOP=0

    DW3: packet header dword3, SOP=0, EOP=0

    DW4: payload dword0, SOP=0, EOP=0

    DW5: payload dword1, SOP=0, EOP=0

    DW6: payload dword2, SOP=0, EOP=1

    DW7: payload dword3, SOP=0, EOP=1

    The example should illustrate that it is not the number of dwords in the header that defines how the data is aligned in the dwords of the avalon packet that you transmit to the PCIe core. Instead, it is the address that defines the alignment.
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Thanks for your patience, but I'm affraid I still haven't fully understood it :rolleyes:

    What I wanna do is to fill a buffer that is page aligned (starts at a memory page border for example 1000h) continuously with data.

    Since my memory address is 32 bits I have to use a 3DW header (using 4 DW header with the 32 MSBs set to zero would be treatet as a malformed TLP by the receiver). According to your explanation the addr. 1000h is QWORD aligned, and thus I would have to insert an invalid (byte enable set to zero) DW3? But when I do so the address where the first valid DWORD will be put is (according to the attached image from the PCIe system arch. book) not 1000h, but 1004h?

    Maybe this doesn't have anything to do with the original problem of the Tx interface to get stalled, but I just wanna make sure...
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I think the confusion is regarding what happens to the "garbage" data inserted to adhere to the data alignment requirements of the core. Any "garbage" data inserted on the Avalon side to adhere to the data alignment guidelines will not be transmitted on the PCIe link. Since this garbage data is not transmitted on the link, the address specified in the header is referring to the first valid piece of data, not the "garbage" data. Likewise, the first dword byte enable in the header refer to the first valid dword of data, not any "garbage" data.

    Using your specific example, although you are using a 3DW header, the 4th dword seen at the Avalon streaming interface would contain some "garbage" data. Even though the "garbage" data exists on the Avalon interface, this "garbage" data will not be transmitted on the PCIe link. As a result, the destination address of the first valid dword of data you send will be 1000h. Similarly, the 1st DW byte enable refers to the 1st dword of valid data after the "garbage" data. So, assuming all four bytes of the 1st dword of valid data are to be written, the 1st DW field will be set to 0xF in the header.

    Does this help clarify?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    I think so.

    I assumed the byte enable bits are used to mark data as "garbage".

    But from your explanation I understand that when using the 3DW header together with the QWORD aligned address (like 1000h) I have to insert a "garbage" DWORD (DW3), and set the 1st DW enable to all ones (since all my real data is valid)?

    This "garbage" DWORD doesn't count to the payload length since it won't be send over the link, right?
  • Altera_Forum's avatar
    Altera_Forum
    Icon for Honored Contributor rankHonored Contributor

    Now I figure what the problem might be: as I didn't insert this "garbage" DWORD my message is one DWORD to short which probably confuses the transmitter. I'll check that when I'm back in the office tomorow...

    Thanks a lot for your help so far!

    edit: yes, inserting the "garbage" DWORD etc. fixed the problem :D