I think we just used a PCIe slave (ie not root) for a link to a small ppc.
The most useful way to use it was via the PCIe -> Avalon master bridge, this allows a single PCIe BAR to be used to access a lot of fpga peripherals. I think the bridge sets the high avalon address bits to fixed values - we used a 32Mb BAR to access all the io and 16MB SDRAM.
One thing worth noting is that the performance of the PCIe slave is not quite what you might expect (think ISA bus speeds) for single cycles. You'll need to use DMA transfers that generate PCIe bursts to get any reasonable throughput.
(I've not initiated transfers from the fpga, but I suspect you'll need to use a DMA engine that is closely associated with the PCIe interface.)