I would recommend writing a kernel module to provide the necessary interface for the user level process. Inside the kernel module, you can use set_memory_uc to control page attributes.
As for the simulator: it should be about ten to a thousand times slower - not a million times - if you do not model at the gate level. Remember to consider the time it takes to write a kernel module. If you need several weeks to write and debug a module, you might be better off using a simulator for a one-time experiment.
Mackie messer
source share