Re: [PATCH] mm: deduct the number of pages reclaimed by madvise from workingset

From: Zhaoyang Huang
Date: Wed May 24 2023 - 21:23:37 EST


On Thu, May 25, 2023 at 4:41 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> On Wed, May 24, 2023 at 2:13 AM zhaoyang.huang
> <zhaoyang.huang@xxxxxxxxxx> wrote:
> >
> > From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> >
> > The pages reclaimed by madvise_pageout are made of inactive and dropped from LRU
> > forcefully, which lead to the coming up refault pages possess a large refault
> > distance than it should be. These could affect the accuracy of thrashing when
> > madvise_pageout is used as a common way of memory reclaiming as ANDROID does now.
>
> Doesn't workingset_eviction() in the following call chain already
> handle nonresident page aging?:
>
> reclaim_pages
> reclaim_folio_list
> shrink_folio_list
> __remove_mapping
> workingset_eviction
> workingset_age_nonresident
Yes. What I suggest is to minor this pages from non-resident as they
are dropped forcefully
>
>
> >
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
> > ---
> > include/linux/swap.h | 2 +-
> > mm/madvise.c | 4 ++--
> > mm/vmscan.c | 8 +++++++-
> > 3 files changed, 10 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index 2787b84..0312142 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -428,7 +428,7 @@ extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem,
> > extern int vm_swappiness;
> > long remove_mapping(struct address_space *mapping, struct folio *folio);
> >
> > -extern unsigned long reclaim_pages(struct list_head *page_list);
> > +extern unsigned long reclaim_pages(struct mm_struct *mm, struct list_head *page_list);
> > #ifdef CONFIG_NUMA
> > extern int node_reclaim_mode;
> > extern int sysctl_min_unmapped_ratio;
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index b6ea204..61c8d7b 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -420,7 +420,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
> > huge_unlock:
> > spin_unlock(ptl);
> > if (pageout)
> > - reclaim_pages(&page_list);
> > + reclaim_pages(mm, &page_list);
> > return 0;
> > }
> >
> > @@ -516,7 +516,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
> > arch_leave_lazy_mmu_mode();
> > pte_unmap_unlock(orig_pte, ptl);
> > if (pageout)
> > - reclaim_pages(&page_list);
> > + reclaim_pages(mm, &page_list);
> > cond_resched();
> >
> > return 0;
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 20facec..048c10b 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2741,12 +2741,14 @@ static unsigned int reclaim_folio_list(struct list_head *folio_list,
> > return nr_reclaimed;
> > }
> >
> > -unsigned long reclaim_pages(struct list_head *folio_list)
> > +unsigned long reclaim_pages(struct mm_struct *mm, struct list_head *folio_list)
>
> You would also need to change Damon usage of reclaim_pages() here:
> https://elixir.bootlin.com/linux/v6.4-rc1/source/mm/damon/paddr.c#L253
ok, thanks for reminding
>
> > {
> > int nid;
> > unsigned int nr_reclaimed = 0;
> > LIST_HEAD(node_folio_list);
> > unsigned int noreclaim_flag;
> > + struct lruvec *lruvec;
> > + struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm);
> >
> > if (list_empty(folio_list))
> > return nr_reclaimed;
> > @@ -2764,10 +2766,14 @@ unsigned long reclaim_pages(struct list_head *folio_list)
> > }
> >
> > nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid));
> > + lruvec = &memcg->nodeinfo[nid]->lruvec;
> > + workingset_age_nonresident(lruvec, -nr_reclaimed);
> > nid = folio_nid(lru_to_folio(folio_list));
> > } while (!list_empty(folio_list));
> >
> > nr_reclaimed += reclaim_folio_list(&node_folio_list, NODE_DATA(nid));
> > + lruvec = &memcg->nodeinfo[nid]->lruvec;
> > + workingset_age_nonresident(lruvec, -nr_reclaimed);
> >
> > memalloc_noreclaim_restore(noreclaim_flag);
> >
> > --
> > 1.9.1
> >