Re: [PATCH v2 5/5] mm: add per-order mTHP swpin_refault counter

From: Barry Song
Date: Thu Apr 11 2024 - 19:02:02 EST


On Fri, Apr 12, 2024 at 3:53 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote:
>
> On 09/04/2024 09:26, Barry Song wrote:
> > From: Barry Song <v-songbaohua@xxxxxxxx>
> >
> > Currently, we are handling the scenario where we've hit a
> > large folio in the swapcache, and the reclaiming process
> > for this large folio is still ongoing.
> >
> > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx>
> > ---
> > include/linux/huge_mm.h | 1 +
> > mm/huge_memory.c | 2 ++
> > mm/memory.c | 1 +
> > 3 files changed, 4 insertions(+)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index c8256af83e33..b67294d5814f 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -269,6 +269,7 @@ enum mthp_stat_item {
> > MTHP_STAT_ANON_ALLOC_FALLBACK,
> > MTHP_STAT_ANON_SWPOUT,
> > MTHP_STAT_ANON_SWPOUT_FALLBACK,
> > + MTHP_STAT_ANON_SWPIN_REFAULT,
>
> I don't see any equivalent counter for small folios. Is there an analogue?

Indeed, we don't count refaults for small folios, as their refault
mechanism is much
simpler compared to large folios. Implementing this counter can enhance the
system's visibility to users.

Personally, having this counter and observing a non-zero value greatly enhances
my confidence when debugging this refault series. Otherwise, it feels like being
blind to what's happening inside the system :-)

>
> > __MTHP_STAT_COUNT
> > };
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index d8d2ed80b0bf..fb95345b0bde 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -556,12 +556,14 @@ DEFINE_MTHP_STAT_ATTR(anon_alloc, MTHP_STAT_ANON_ALLOC);
> > DEFINE_MTHP_STAT_ATTR(anon_alloc_fallback, MTHP_STAT_ANON_ALLOC_FALLBACK);
> > DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
> > DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
> > +DEFINE_MTHP_STAT_ATTR(anon_swpin_refault, MTHP_STAT_ANON_SWPIN_REFAULT);
> >
> > static struct attribute *stats_attrs[] = {
> > &anon_alloc_attr.attr,
> > &anon_alloc_fallback_attr.attr,
> > &anon_swpout_attr.attr,
> > &anon_swpout_fallback_attr.attr,
> > + &anon_swpin_refault_attr.attr,
> > NULL,
> > };
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 9818dc1893c8..acc023795a4d 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -4167,6 +4167,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > nr_pages = nr;
> > entry = folio->swap;
> > page = &folio->page;
> > + count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_SWPIN_REFAULT);
>
> I don't think this is the point of no return yet? There's the pte_same() check
> immediately below (although I've suggested that needs to be moved to earlier),
> but also the folio_test_uptodate() check. Perhaps this should go after that?
>

swap_pte_batch() == nr_pages should have passed the test for pte_same.
folio_test_uptodate(folio)) should be also unlikely to be true as we are
not reading from swap devices for refault case.

but i agree we can move all the refault handling after those two "goto
out_nomap".

> > }
> >
> > check_pte:
>

Thanks
Barry