Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio

From: Daniel Gomez
Date: Thu Aug 08 2024 - 06:58:04 EST


On Thu, Aug 08, 2024 at 12:48:52PM GMT, Daniel Gomez wrote:
> On Thu, Aug 08, 2024 at 05:34:50PM GMT, Baolin Wang wrote:
> >
> >
> > On 2024/8/8 16:51, David Hildenbrand wrote:
> > > On 08.08.24 04:36, Baolin Wang wrote:
> > > >
> > > >
> > > > On 2024/8/7 23:53, David Hildenbrand wrote:
> > > > > On 07.08.24 09:31, Baolin Wang wrote:
> > > > > > Page reclaim will not scan anon LRU if no swap space, however
> > > > > > MADV_PAGEOUT
> > > > > > can still split shmem large folios even without a swap device. Thus add
> > > > > > swap available space validation before spliting shmem large folio to
> > > > > > avoid redundant split.
> > > > > >
> > > > > > Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
> > > > > > ---
> > > > > >    mm/vmscan.c | 8 ++++++++
> > > > > >    1 file changed, 8 insertions(+)
> > > > > >
> > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > index 31d13462571e..796f65781f4f 100644
> > > > > > --- a/mm/vmscan.c
> > > > > > +++ b/mm/vmscan.c
> > > > > > @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct
> > > > > > list_head *folio_list,
> > > > > >                }
> > > > > >            } else if (folio_test_swapbacked(folio) &&
> > > > > >                   folio_test_large(folio)) {
> > > > > > +
> > > > > > +            /*
> > > > > > +             * Do not split shmem folio if no swap memory
> > > > > > +             * available.
> > > > > > +             */
> > > > > > +            if (!total_swap_pages)
> > > > > > +                goto activate_locked;
> > > > > > +
> > > > > >                /* Split shmem folio */
> > > > > >                if (split_folio_to_list(folio, folio_list))
> > > > > >                    goto keep_locked;
> > > > >
> > > > > Reminds me of
> > > > >
> > > > > commit 9a976f0c847b67d22ed694556a3626ed92da0422
> > > > > Author: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> > > > > Date:   Thu Mar 9 15:05:43 2023 -0800
> > > > >
> > > > >       shmem: skip page split if we're not reclaiming
> > > > >       In theory when info->flags & VM_LOCKED we should not be getting
> > > > >       shem_writepage() called so we should be verifying this with a
> > > > >       WARN_ON_ONCE().  Since we should not be swapping then best to
> > > > > ensure we
> > > > >       also don't do the folio split earlier too.  So just move the check
> > > > > early
> > > > >       to avoid folio splits in case its a dubious call.
> > > > >       We also have a similar early bail when !total_swap_pages so just
> > > > > move that
> > > > >       earlier to avoid the possible folio split in the same situation.
> > > > >
> > > > >
> > > > > But indeed, pageout() -> writepage() is called *after* the split in the
> > > > > vmscan path.
> > > > >
> > > > > In that "noswap" context, I wonder if we also want to skip folios part
> > > > > of shmem
> > > > > with disabled swapping?
> > > >
> > > > Yes, I think so.
> > > >
> > > > >
> > > > > But now I am wondering under which circumstances we end up calling
> > > > > shmem_writepage() with a large folio. And I think the answer is the
> > > > > comment of
> > > > > folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
> > > > >
> > > > >
> > > > > ... so if shmem_writepage() handles+checks that, could we do
> > > > >
> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > index a332cb80e928..7dfa3d6e8ba7 100644
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
> > > > > list_head *folio_list,
> > > > >                                                   goto
> > > > > activate_locked_split;
> > > > >                                   }
> > > > >                           }
> > > > > -               } else if (folio_test_swapbacked(folio) &&
> > > > > -                          folio_test_large(folio)) {
> > > > > -                       /* Split shmem folio */
> > > > > -                       if (split_folio_to_list(folio, folio_list))
> > > > > -                               goto keep_locked;
> > > > >                   }
> > > > >
> > > > >                   /*
> > > > >
> > > > > instead?
> > > >
> > > > Seems reasonable to me. But we should pass the 'folio_list' to
> > > > shmem_writepage() to list the subpages of the large folio. Let me try.
> > >
> > > Ah, yes, good point. Alternatively, we'd have to split and try writing
> > > all subpages in there. I wonder what to do if we fail to write some, and
> > > if we could handle that transparently, without the folio_list.
> >
> > After some investigation, I prefer to pass 'folio_list' to shmem_writepage()
> > via 'struct writeback_control', which could simplify the logic a lot.
> > Otherwise, we need to handle each subpage's writeback/reclaim/dirty state,
> > as well as tracking each subpage's write result, which seems more
> > complicated.
> >
> > I made a quick change by passing 'folio_list', and it looks simple and works
> > as expected.
> >
> > diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> > index 75196b0f894f..10100e22d5c6 100644
> > --- a/include/linux/writeback.h
> > +++ b/include/linux/writeback.h
> > @@ -80,6 +80,9 @@ struct writeback_control {
> > */
> > struct swap_iocb **swap_plug;
> >
> > + /* Target list for splitting a large folio */
> > + struct list_head *list;
> > +
> > /* internal fields used by the ->writepages implementation: */
> > struct folio_batch fbatch;
> > pgoff_t index;
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 05525e9e7423..0a5a68f7d0a0 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1496,9 +1496,10 @@ static int shmem_writepage(struct page *page, struct
> > writeback_control *wbc)
> > * and its shmem_writeback() needs them to be split when swapping.
> > */
> > if (wbc->split_large_folio && folio_test_large(folio)) {
> > +try_split:
> > /* Ensure the subpages are still dirty */
> > folio_test_set_dirty(folio);
> > - if (split_huge_page(page) < 0)
> > + if (split_huge_page_to_list_to_order(page, wbc->list, 0))
>
> We check for split_large_folio, but we still send the wbc->list for i915
> writepage() case. Previously, we were sending a NULL list. Shouldn't we address
> that case too?

I guess I was missing wbc initialization snippet:

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index fe69f2c8527d..174b95a9a988 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -308,6 +308,7 @@ void __shmem_writeback(size_t size, struct address_space *mapping)
.range_start = 0,
.range_end = LLONG_MAX,
.for_reclaim = 1,
+ .list = NULL,
};
unsigned long i;

> > goto redirty;
> > folio = page_folio(page);
> > folio_clear_dirty(folio);
> > @@ -1540,8 +1541,12 @@ static int shmem_writepage(struct page *page, struct
> > writeback_control *wbc)
> > }
> >
> > swap = folio_alloc_swap(folio);
> > - if (!swap.val)
> > + if (!swap.val) {
> > + if (nr_pages > 1)
> > + goto try_split;
> > +
> > goto redirty;
> > + }
> >
> > /*
> > * Add inode to shmem_unuse()'s list of swapped-out inodes,
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 277571815789..cf982cf2454f 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -628,7 +628,7 @@ typedef enum {
> > * Calls ->writepage().
> > */
> > static pageout_t pageout(struct folio *folio, struct address_space
> > *mapping,
> > - struct swap_iocb **plug)
> > + struct swap_iocb **plug, struct list_head
> > *folio_list)
> > {
> > /*
> > * If the folio is dirty, only perform writeback if that write
> > @@ -676,6 +676,11 @@ static pageout_t pageout(struct folio *folio, struct
> > address_space *mapping,
> > .swap_plug = plug,
> > };
> >
> > + if (shmem_mapping(mapping)) {
> > + wbc.list = folio_list;
> > + wbc.split_large_folio =
> > !IS_ENABLED(CONFIG_THP_SWAP);
> > + }
> > +
> > folio_set_reclaim(folio);
> > res = mapping->a_ops->writepage(&folio->page, &wbc);
> > if (res < 0)
> > @@ -1259,23 +1264,6 @@ static unsigned int shrink_folio_list(struct
> > list_head *folio_list,
> > goto activate_locked_split;
> > }
> > }
> > - } else if (folio_test_swapbacked(folio) &&
> > - folio_test_large(folio)) {
> > -
> > - /*
> > - * Do not split shmem folio if no swap memory
> > - * available.
> > - */
> > - if (!total_swap_pages)
> > - goto activate_locked;
> > -
> > - /*
> > - * Only split shmem folio when CONFIG_THP_SWAP
> > - * is not enabled.
> > - */
> > - if (!IS_ENABLED(CONFIG_THP_SWAP) &&
> > - split_folio_to_list(folio, folio_list))
> > - goto keep_locked;
> > }
> >
> > /*
> > @@ -1377,18 +1365,21 @@ static unsigned int shrink_folio_list(struct
> > list_head *folio_list,
> > * starts and then write it out here.
> > */
> > try_to_unmap_flush_dirty();
> > -try_pageout:
> > - switch (pageout(folio, mapping, &plug)) {
> > + switch (pageout(folio, mapping, &plug, folio_list))
> > {
> > case PAGE_KEEP:
> > goto keep_locked;
> > case PAGE_ACTIVATE:
> > - if (shmem_mapping(mapping) &&
> > folio_test_large(folio) &&
> > - !split_folio_to_list(folio, folio_list))
> > {
> > + /* Shmem can be split when writeback to swap
> > */
> > + if ((nr_pages > 1) &&
> > !folio_test_large(folio)) {
> > + sc->nr_scanned -= (nr_pages - 1);
> > nr_pages = 1;
> > - goto try_pageout;
> > }
> > goto activate_locked;
> > case PAGE_SUCCESS:
> > + if ((nr_pages > 1) &&
> > !folio_test_large(folio)) {
> > + sc->nr_scanned -= (nr_pages - 1);
> > + nr_pages = 1;
> > + }
> > stat->nr_pageout += nr_pages;
> >
> > if (folio_test_writeback(folio)) {