Forum Discussion
Altera_Forum
Honored Contributor
12 years ago --- Quote Start --- It is certainly much easier if you use a single BAR to access all Avalon slaves, and make the base of the addressed area include Avalon address zero. Basically the PCIe slave block removes the high address bits from the PCIE address, and then (effectively) substitutes a different (fixed for each BAR) set of high address bits. The sopc code had some strange restrictions on where BARs could address, qsys may have the similar ones. I suspect that they've tried to make it 'simple' and only succeeding in making it confusing! The nios cpu (without mmu) uses the high address bit to mean 'cache bypass' - so it can only generate 31bit addresses (including the address bits that convert to byte enables). The PCIe master interface is rather horrid - the requirements don't really match that of an Avalon slave. For single-cycle pio requests I'd be tempted to write an Avalon slave that can latch the required 64bit pcie address and data, and then be told to perform a single master transfer - the poll the slave interface for when the request finishes. (A bit like a very degenerate dma controller.) For longer transfers a dma controller that can read avalon data and the burst write to the pcie slave (or burst read the pcie slave and write to avalon addresses) would be useful - with the nios cpu polling for completions and managing any request queue. But I can't immediately see how to use any of the existing dma controllers for that purpose. --- Quote End --- Ok DSL thanks ... I believe since I started the BAR1 accessable at 0x00010000 ( scratchpad IM ) .. so if offsets from the primary decode of BAR1 are used ... I may need to adjust the PCIe references to BAR1 + 0 to get to the first scratchpad IM location ... and not use 0x00010000 which is the address when referenced by the NIOS master. One thing also , when I do read from the NIOS to the RC ( ststem memory ) , Gen1 X1 , the time from the read on the link to the read completion is a whopping 900 nS approx. Since the ARM system (RC ) is running its DDR at 400 Mhz, I'm trying to figure out where the 360 clock cycles went. There are some theories ... 1) the link is going into L0s ... some low power mode but I don't see gaps of more than several mucro seconds so doubt this . 2) the read completion is queued behind posted writes ... maybe but I read the same kind of latency when there are no posted writes . 3) read completion is waiting for some token / resource at the FPGA PCIe endpoint ? Any other ideas ? Thanks Bob.