Mostly for weird things like cached MMIO areas, I think.
Skylake introduced CLFLUSHOPT's poorly ordered higher performance because it is useful for non-volatile storage connected directly to the memory hierarchy. Flushing the cache ensures that the data will be written to the actual memory and not still dirty on the CPU.
This potentially also matters for non-DMA cache devices if something else can do it in x86. (Probably not, I think all DMAs are now cached.)
I am not an expert in this, and this does not mean that this is a complete answer, covering all use cases.
Peter Cordes
source share