Hi everyone,
While I'm working with a tiered memory system e.g. CXL memory, I have
been facing migration overhead esp. tlb shootdown on promotion or
demotion between different tiers. Yeah.. most tlb shootdowns on
migration through hinting fault can be avoided thanks to Huang Ying's
work, commit 4d4b6d66db ("mm,unmap: avoid flushing tlb in batch if PTE
is inaccessible"). See the following link for more information:
https://lore.kernel.org/lkml/20231115025755.GA29979@xxxxxxxxxxxxxxxxxxx/
However, it's only for migration through hinting fault. I thought it'd
be much better if we have a general mechanism to reduce all the tlb
numbers that we can apply to any unmap code, that we normally believe
tlb flush should be followed.
I'm suggesting a new mechanism, LUF(Lazy Unmap Flush), defers tlb flush
until folios that have been unmapped and freed, eventually get allocated
again. It's safe for folios that had been mapped read-only and were
unmapped, since the contents of the folios don't change while staying in
pcp or buddy so we can still read the data through the stale tlb entries.
tlb flush can be defered when folios get unmapped as long as it
guarantees to perform tlb flush needed, before the folios actually
become used, of course, only if all the corresponding ptes don't have
write permission. Otherwise, the system will get messed up.
To achieve that:
1. For the folios that map only to non-writable tlb entries, prevent
tlb flush during unmapping but perform it just before the folios
actually become used, out of buddy or pcp.