Re: [RFC 2/2] mm: Defer TLB flush by keeping both src and dst folios at migration

From: Byungchul Park
Date: Tue Aug 15 2023 - 22:44:40 EST


On Wed, Aug 16, 2023 at 09:01:12AM +0800, Huang, Ying wrote:
> Byungchul Park <byungchul@xxxxxx> writes:
>
> > On Tue, Aug 15, 2023 at 09:27:26AM +0800, Huang, Ying wrote:
> >> Byungchul Park <byungchul@xxxxxx> writes:
> >>
> >> > Implementation of CONFIG_MIGRC that stands for 'Migration Read Copy'.
> >> >
> >> > We always face the migration overhead at either promotion or demotion,
> >> > while working with tiered memory e.g. CXL memory and found out TLB
> >> > shootdown is a quite big one that is needed to get rid of if possible.
> >> >
> >> > Fortunately, TLB flush can be defered or even skipped if both source and
> >> > destination of folios during migration are kept until all TLB flushes
> >> > required will have been done, of course, only if the target PTE entries
> >> > have read only permission, more precisely speaking, don't have write
> >> > permission. Otherwise, no doubt the folio might get messed up.
> >> >
> >> > To achieve that:
> >> >
> >> > 1. For the folios that have only non-writable TLB entries, prevent
> >> > TLB flush by keeping both source and destination of folios during
> >> > migration, which will be handled later at a better time.
> >> >
> >> > 2. When any non-writable TLB entry changes to writable e.g. through
> >> > fault handler, give up CONFIG_MIGRC mechanism so as to perform
> >> > TLB flush required right away.
> >> >
> >> > 3. TLB flushes can be skipped if all TLB flushes required to free the
> >> > duplicated folios have been done by any reason, which doesn't have
> >> > to be done from migrations.
> >> >
> >> > 4. Adjust watermark check routine, __zone_watermark_ok(), with the
> >> > number of duplicated folios because those folios can be freed
> >> > and obtained right away through appropreate TLB flushes.
> >> >
> >> > 5. Perform TLB flushes and free the duplicated folios pending the
> >> > flushes if page allocation routine is in trouble due to memory
> >> > pressure, even more aggresively for high order allocation.
> >>
> >> Is the optimization restricted for page migration only? Can it be used
> >> for other places? Like page reclaiming?
> >
> > Just to make sure, are you talking about the (5) description? For now,
> > it's performed at the beginning of __alloc_pages_slowpath(), say, before
> > page recaiming. Do you think it'd be meaningful to perform it during page
> > reclaiming? Or do you mean something else?
>
> Not for (5). TLB needs to be flushed during page reclaiming too. Can
> similar method be used to reduce TLB flushing there too?

Hm.. The mechanism can be used in any places where page mapping is
changing but it requires not to have write permission that might mess up
consistancy with more than one copy of page.

JFYI, one of future works is to detect read mostly pages and turn them
to read only to make use of them iff it gives a better performance.

Byungchul