--- Quote Start ---
I am playing with the PCIE testbench simulation today, Now I understood that BAR is one to one mapped between the RC and EP. I think in my case BAR0/1 is used for simple memory read/write
--- Quote End ---
Think of the BAR as the interface needed by the host (the RC). The only way the host can talk to the PCIe EP is via the BAR.
--- Quote Start ---
and BAR2 is used for DMA.
--- Quote End ---
Not quite, you need to be clear in your descriptions, so its clear that you understand what is going on.
The DMA controller must be a PCIe bus master and so it generates its own 64-bit addresses. If the RC needs to program the DMA controller, then the DMA control registers might live in BAR2.
--- Quote Start ---
I understand the basics of DMA which frees the CPU from handling all the data moving work. (isn't still require a method to tell the EP which address to start?)
--- Quote End ---
The RC needs to program the EP DMA controller registers.
For a simple DMA controller, controller registers have a source address, a destination address, the data length to transfer, a control register (with a "go!" bit) and a status register (with a "done!" bit).
"Real" DMA controllers are more complicated than that. They have scatter-gather buffers, which are basically linked lists of data transfers to perform. The DMA controller will consume the linked list based on other register settings, eg., "do this every time you get an interrupt". This relieves the processor of doing anything ... other than the original setup of the scatter-gather lists. The host (RC) can optionally be interrupted as DMA events occur.
--- Quote Start ---
at the EP, I have a 20Gbytes nand flashes, should I make the BAR bigger? or there is other way around? lol I dont think I can make BAR 20G, right?
--- Quote End ---
No, if you make your BAR 20G, you will not be able to boot your PC. The BIOS will choke. If you read the notes I linked to in the other thread, I could not boot my EliteBook laptop if I made the BAR too big.
If you want the RC to get data from the 20G drive, then it has to program the DMA controller with source addresses that correspond to the 20G drive (or memory buffers that drive creates), and then transfer those buffers using DMA to the host.
If you dig into the filesystem for your OS, you'll find that it works in pages/sectors, i.e., blocks of bytes. Your DMA controller needs to move a request page of bytes from the PCIe EP over the PCIe bus to the host memory (where it is possibly copied into a page that the filesystem driver gave you).
Something like that anyway ...
Cheers,
Dave