--- Quote Start ---
By cacheable BAR I mean BAR that can be cached by Intel processor cache.
--- Quote End ---
That is not a function of the PCIe device, its a function of the Intel processor. The only PCIe bus feature you can control via the configuration registers is whether the memory region is read prefetchable or not. There are some cacheline registers, but they have an effect during DMA, and for bridges (at least under PCI).
--- Quote Start ---
Typically, BARs are not cached by processor cache, however, in this case caching is desirable.
I am using Linux, CentOS 5 (2.6.18).
I modified MTRR settings to exclude the BAR from uncached regions. Also, I wrote a driver that creates bin_attribute in /sys/... with custom mmap() function that maps the BAR into user space without setting _PAGE_PCD | _PAGE_PWT page flags.
When the BAR is mmaped into user space I can issue reads to it and observed caching behavior, i.e. 2nd read to same address does not go to FPGA.
However, when I am trying to issue a write the same BAR, the system reboots without any message on the screen or in the logs.
So, I am wondering whether I am doing something wrong in the driver/settings or Stratix IV PCIe implementation does not support some feature, which is needed for this to work properly?
--- Quote End ---
Writes should always go through to the PCIe endpoint. If you want higher performance for your writes, trying to manipulate the processor cache is the wrong way to go. You need to implement a DMA master at the PCIe end-point and have it DMA from the main host memory.
Trying to play games with caches is asking for trouble, since you cannot snoop the cache and basically keep it consistent.
Cheers,
Dave