Re: Regression of madvise(MADV_COLD) on shmem?

From: Yu Zhao
Date: Thu Mar 10 2022 - 19:09:43 EST


On Thu, Mar 10, 2022 at 2:01 AM Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Mon 07-03-22 13:10:08, Michal Hocko wrote:
> > On Sat 05-03-22 02:17:37, Yu Zhao wrote:
> > [...]
> > > diff --git a/mm/swap.c b/mm/swap.c
> > > index bcf3ac288b56..7fd99f037ca7 100644
> > > --- a/mm/swap.c
> > > +++ b/mm/swap.c
> > > @@ -563,7 +559,7 @@ static void lru_deactivate_file_fn(struct page
> > > *page, struct lruvec *lruvec)
> > >
> > > static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec)
> > > {
> > > - if (PageActive(page) && !PageUnevictable(page)) {
> > > + if (!PageUnevictable(page)) {
> > > int nr_pages = thp_nr_pages(page);
> > >
> > > del_page_from_lru_list(page, lruvec);
> > > @@ -677,7 +673,7 @@ void deactivate_file_page(struct page *page)
> > > */
> > > void deactivate_page(struct page *page)
> > > {
> > > - if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
> > > + if (PageLRU(page) && !PageUnevictable(page)) {
> > > struct pagevec *pvec;
> > >
> > > local_lock(&lru_pvecs.lock);
> > >
> > > I'll leave it to Minchan to decide whether this is worth fixing,
> > > together with this one:
> >
> > There doesn't seem to be any dependency on the PageActive anymore. I do
> > remember we have relied on the PageActive to move from the active list
> > to the inactive. This is not the case anymore but I am wondering whether
> > above is really sufficient. If you are deactivating an inactive page
> > then I would expect you want to move that page in the LRU as well. In
> > other words don't you want
> > if (page_active)
> > add_page_to_lru_list
> > else
> > add_page_to_lru_list_tail

Yes, this is better.

> Do you plan to send an official patch?

One thing I still haven't thought through is why the A-bit couldn't
protect the blob in the test. In theory it should be enough even
though deactivate_page() is a NOP.

1. all pages are initially inactive and have the A-bit set
2. madvise(COLD) clears the A-bit for zero-filled pages (but fails to
change their LRU positions)
3. the memcg hits the limit
4. pages in the blob are moved to the active LRU because those pages
still have the A-bit (zero-filled pages remain inactive)
5. inactive_is_low() tests true and the blob gets deactivated???

The last step doesn't make sense, since the inactive list is still very large.

Thanks.