Thanks for the pointer to the tool! We have similar thing - dll for tcl which is quite flexible and convenient.
I tried to use CentOS 6 (2.6.32), which actually has support for PAT, write-combining and has ioremp_cache() function.
Write-combining/non-temporal writes (AVX/SSE) work as expected.
Reads work as expected when PCI BAR is mmaped as cached. Non-temporal full cache line writes work as expected in this case. However, simple writes still cause the system to freeze and then to reboot.
I also tried to do ioremp_cache() and then iowrite32() in the driver code and this also causes system freeze/reboot.
The system Sandy Bridge i7.