Forum Discussion

BrianM's avatar
BrianM
Icon for Occasional Contributor rankOccasional Contributor
5 years ago
Solved

Bug! Quartus Pro 20.1.1, Cyclone V, utilizing PCIe example from 16.1.

Problem Details
Error:
Internal Error: Sub-system: VPR20KMAIN, File: /quartus/fitter/vpr20k/altera_arch_common/altera_arch_place_anneal.c, Line: 2744
Internal Error
Stack Trace:
    0xdce7a: vpr_qi_jump_to_exit + 0x6f (fitter_vpr20kmain)
   0x797f83: vpr_exit_at_line + 0x53 (fitter_vpr20kmain)
 
   0x2ec700: l_initial_low_temp_moves + 0x1cb (fitter_vpr20kmain)
   0x6f97f1: l_thread_pool_do_work + 0x41 (fitter_vpr20kmain)
   0x2c74fb: l_thread_pool_fn + 0x4e (fitter_vpr20kmain)
    0xefe28: l_thread_start_wrapper(void*) + 0x29 (fitter_vpr20kmain)
     0x5acc: thr_final_wrapper + 0xc (ccl_thr)
    0x3eeef: msg_thread_wrapper(void* (*)(void*), void*) + 0x62 (ccl_msg)
     0x9f9c: mem_thread_wrapper(void* (*)(void*), void*) + 0x5c (ccl_mem)
     0x8b39: err_thread_wrapper(void* (*)(void*), void*) + 0x27 (ccl_err)
     0x5b0f: thr_thread_wrapper + 0x15 (ccl_thr)
     0x5df2: thr_thread_begin + 0x46 (ccl_thr)
     0x7f9e: start_thread + 0xde (pthread.so.0)
    0xfd0af: clone + 0x3f (c.so.6)
 
End-trace


Executable: quartus
Comment:
Device is very full. Trying to shoe horn it in by forcing most ram into M10K. Reduced the PCIe DMA buffer from 256K to 128K got me the space that I needed, but now this crash has occurred.

System Information
Platform: linux64
OS name: This is
OS version:

Quartus Prime Information
Address bits: 64
Version: 20.1.1
Build: 720
Edition: Standard Edition

I'd love some support. I'm trying to get PCIe Root Port working in Cyclone V and so far have not found any examples that will fit and meet timing in the part I chose: 5CSXFC5D6F31C7.

I followed an example called: cv_soc_rp_simple_design and it won't fit with my logic.

I followed another example that didn't have bus syncs, where I tied everything from PCIe directly to the HPS and the xcvr_reconfig block, but that missed timing by up to 1.8ns on the 125MHz path to the main HPS DRAM.

This failure happens with the former, the cv_soc_rp_simple_design/pcie_rp_ed_5csxfc6.qsys which didn't fit until I trimmed all the unnecessary logic, including the jtag port and I shrunk all the bus retimers to be as small as possible. The last change was shrinking the 256KB pcie DMA buffer to 128K and then this error occurred.

If there's a simple way to send the database through the FAEs I've been working with, let me know.

Thanks in advance.

  • I got it working.

    The last hurdle was the MSI interface. I'm not sure why it wasn't working, but restarting the design from scratch with my recently acquired knowledge got everything working.

    I've attached my qsys file and socfpga.dtsi. Hopefully it will help others get a jump on things so they don't have to learn everything the hard way as I did.

    This design is not optimized for speed, nor is it optimized for space.

    NVMe read speed is around 80 MB/s.

    NVMe write speed is around 50 MB/s.

    Faster drives will do a little better, but even the over a gig per second on PC 4 lane part I have doesn't do much better than 110 MB/s read. Bandwidth is limited by the ARM memory interface and the fact that bursting logic at 125MHz causes timing violations. A burst length of one on the Txs interface has got to slow things down.

    Performance is quoted with the 5.11 kernel, the 5.4 kernel is not quite as fast. Still fast enough for an embedded system though.

    Don't forget to enable the fpga, pcie and msi modules in your top level board dts file.

    And don't forget to reserve the first 64K of DRAM as stated above. If you don't you will get read errors.

    There are probably better configurations, and eventually I'll probably try to optimize for size since my logic will need to get bigger on the next project. But for now. It finally works.

    Good luck to you all.

17 Replies

    • BrianM's avatar
      BrianM
      Icon for Occasional Contributor rankOccasional Contributor

      Thanks for the reply.

      I ended up using this: https://releases.rocketboards.org/release/2015.10/pcie-ed/hw/cv_soc_rp_simple_design.tar.gz

      I think that is the same one that you used.

      When I included my design with it, it did not fit. I ended up reducing the PCIe modules functionality by reducing synchroniser memory usage and switching all my FIFOs to using M10K to save MLAB space.

      The error I posted above occurred when I changed all the resync buffers to 1 deep, switched all my fifos to M10K and cut the 256K DMA buffer to 128K.

      After I made the post, I cut the DMA buffer down to 60K and for the first time all the code fit and met timing.

      Now that the FPGA seems like it might work, I need to get linux working with the PCIe port.

      sopc2dts.JAVA does not generate a usable file. There are a few features I had to add to the source code to get it to run on the sopc file generated by 20.1.1 (the java did not know about altera_pll, and there was an issue getting the Txs port address due to a bug) . Even after those modifications and even though the kernel accepts it, it does not work and does not generate any error messages from the kernel. I end up with hardware that is missing a CPU, an ethernet and many of the other features, including PCIe.

      If I could see a working example of the device tree for the pcie, msgdma, etc, then I'm certain I can get my hardware to work with the linux kernel using a manually created dts I made based on the socfpga device tree files found in the linux kernel.

      Do you have an example of the device tree entries for this design that function on kernel 5.4 that you can show me?

      Thanks again for your help. I really appreciate it.

  • SengKok_L_Intel's avatar
    SengKok_L_Intel
    Icon for Regular Contributor rankRegular Contributor

    No additional design that I aware of other than the design posted at rocketboard.


    • BrianM's avatar
      BrianM
      Icon for Occasional Contributor rankOccasional Contributor

      I'm not going to be able to get anywhere unless I get some help creating a dts. Is there contractor experts on these matters?

      More importantly, has anyone noticed that rocketboards.org has lost its domain?

      • BrianM's avatar
        BrianM
        Icon for Occasional Contributor rankOccasional Contributor

        I need help with dts. Surely there is someone somewhere who know how to do a dts with kernel 5.4.72 and Quartus Pro 20.1.1.

  • SengKok_L_Intel's avatar
    SengKok_L_Intel
    Icon for Regular Contributor rankRegular Contributor

    Hi

    Apologize that no much helpful info regarding the Root Port driver development that can be offered here. Your understanding is much appreciated.


  • SengKok_L_Intel's avatar
    SengKok_L_Intel
    Icon for Regular Contributor rankRegular Contributor

    If further support is needed in this thread, please post a response within 15 days. After 15 days, this thread will be transitioned to community support. The community users will be able to help you with your follow-up questions.


    • BrianM's avatar
      BrianM
      Icon for Occasional Contributor rankOccasional Contributor

      Let's keep this open until I figure it out. I'll update the thread with the solution.

      • BrianM's avatar
        BrianM
        Icon for Occasional Contributor rankOccasional Contributor

        I've made significant progress. Let me state some things I've learned so people don't have to go through the same stuff.

        1. You must regenerate u-boot every time you change Platform Designer's qsys file. This took me two weeks to realize. Nothing worked because I simply hadn't understood that there is a huge amount of hardware configuration in u-boot that the kernel cannot do because once the DRAM is active, the registers can't be changed.

        2. In order to run at full speed, 125 MHz, you must use clock-crossing and pipeline bridges. The "auto constructor" software is not generally good enough to build connections that will run at speed.

        3. All addresses listed in Platform Designer must be consistent across MM Masters. Period. The software assumes that and there is no way to work around it.

        4. It does not appear (although I still have to confirm this) that the MSGDMA and embedded SRAM modules are used at all by the linux kernel. There is no point in adding them.

        5. Bandwidth from a two lane NVMe is roughly 132 MB / second read. But my config is not working yet, so this will need to be confirmed as well.

        I think I'm stuck in the same way as the posters in this thread: https://forum.rocketboards.org/t/altera-pcie-driver-issue-with-ssd-devices/545

        There is an issue with the way the Altera PCIe linux driver programs the Root Port to write DRAM. I have not been able to get it to work. There are two proposed (and working versions from 15.1 to 18.1) in that thread. I have not tried the one that requires editing of the verilog generated by Platform Designer. Creating a flow based on code hacks is not appealing to me. I have tried the second fix, but either I have misinterpreted how they got it to work, or more recent software does not allow the hack. I have tried dozens of configurations to try to get it to work, but all have failed. If I tamper with the offset address of Txs, Root Port DMA dead locks every time, I assume because the data does not go where it is expected to be.

        As documented in the thread, the issue occurs when reading from the PCIe device. One group was reading from a PCIe SATA controller and the other group from an NVMe like me. Reads fail. Zero data is returned instead of actual data.

        According to the thread, the Root Port is trashing ram, writing over Kernel space or something like that. I don't fully understand the claims. One solution suggested involves adding an address span extender between the h2f HPS interface and the Txs port and adjusting the offset to the Txs by 64 MB and also reserving the second 64MB of RAM for Root Port DMA space. The offset is magically added to the DRAM destination address, thus preventing the Root Port from destroying DRAM. The other solutio shifts the Txs the Root Port DMA out of DRAM altogether and hacking the generated code to add an offset of 0x50000000 to the Txs address. I don't see how that can work, since bit 30 is not utilized on the h2f interface and the Root Port needs to write to physical DRAM to bring data in from the PCIe device.

        In my case I get errors such as this:

        root@cyclone5:~# ./hdparm -tT /dev/nvme0n1p1

        /dev/nvme0n1p1:
        Timing cached reads: 940 MB in 2.00 seconds = 469.80 MB/sec
        Timing buffered disk reads: [ 204.434383] blk_update_request: critical medium error, dev nvme0n1, sector 281088 op 0x0:(READ) flags 0x80700 phys_seg 1 prio c
        lass 0
        [ 204.449791] blk_update_request: critical medium error, dev nvme0n1, sector 281096 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
        [ 204.461410] Buffer I/O error on dev nvme0n1p1, logical block 34881, async page read
        read(2097152) returned 266240 bytes

        After that I can do an fsck.ext4 /dev/nvme0n1p1 and repair the drive, but the next read will fail again.

        Sometimes the read hangs the CPU and watchdog triggers. Which genuinely sounds like code space is being corrupted.

        I have studied the 'ranges' and the 'dma-ranges' properties. I have dug through the code to try to understand how I can control the addresses used by the Root Port. I have not found anything that seems to work. The only thing I've succeeded in doing is causing the pcie driver to not load because it can't find the Txs port and the nvme driver to hang.

        I feel I am very close to getting this to work, but I need a better understanding of how the Root Port should be configured to ensure "DMA" works from the Root Port to DRAM.

        Thank you for your patience and any assistance you can offer.