FvM
Super Contributor
2 years agoPCIe hprxm_master doesn't handle unaligned reads correctly
Hello,
we have an Arria 10 PCIe design with MM DMA interface, utilizing hprxm_master (bursting BAR2 rxm) for non-DMA access. Platform designer IP does a good job translating between different clock domains and interface widths for various slave components. There seems to be however a problem with unaligned (crossing 256 bit boundary) hprxm reads from slow slaves. In the appended Signaltap screenshot, I'm performing a 64 bit read at offset 0x1c. hprxm_master is splitting it into two read reads (burstcount = 2). hprxm read is requested at clock 0, first readdatavalid at clock 55, second at clock 100. The large latency is caused by a clock crossing bridge and interface width translation. Unfortunately, hprxm is sending completion message with data already at clock 63 without waiting for the second read, respectively it delivers arbitrary wrong data for the second word. The same unaligned read action performed per DMA gives correct result. Also unaligned read from fast (core clock domain + wide interface) slave, see second signaltap screenshot. Here both reads are occurring without intermediate latency.
I consider the reported hprxm_master behaviour as bug, DMA proves that unaligned read can work correctly with slow slaves. IP version is QPP 22.4 design suite.
Best regards
Frank
we have an Arria 10 PCIe design with MM DMA interface, utilizing hprxm_master (bursting BAR2 rxm) for non-DMA access. Platform designer IP does a good job translating between different clock domains and interface widths for various slave components. There seems to be however a problem with unaligned (crossing 256 bit boundary) hprxm reads from slow slaves. In the appended Signaltap screenshot, I'm performing a 64 bit read at offset 0x1c. hprxm_master is splitting it into two read reads (burstcount = 2). hprxm read is requested at clock 0, first readdatavalid at clock 55, second at clock 100. The large latency is caused by a clock crossing bridge and interface width translation. Unfortunately, hprxm is sending completion message with data already at clock 63 without waiting for the second read, respectively it delivers arbitrary wrong data for the second word. The same unaligned read action performed per DMA gives correct result. Also unaligned read from fast (core clock domain + wide interface) slave, see second signaltap screenshot. Here both reads are occurring without intermediate latency.
I consider the reported hprxm_master behaviour as bug, DMA proves that unaligned read can work correctly with slow slaves. IP version is QPP 22.4 design suite.
Best regards
Frank