Re: [PATCH v4 1/2] mm: store zero pages to be swapped out in a bitmap
From: Barry Song
Date: Wed Sep 04 2024 - 03:55:31 EST
On Wed, Sep 4, 2024 at 7:22 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>
> On Wed, Sep 4, 2024 at 12:17 AM Barry Song <21cnbao@xxxxxxxxx> wrote:
> >
> > On Wed, Sep 4, 2024 at 7:12 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> > >
> > > [..]
> > > > > @@ -426,6 +515,26 @@ static void sio_read_complete(struct kiocb *iocb, long ret)
> > > > > mempool_free(sio, sio_pool);
> > > > > }
> > > > >
> > > > > +static bool swap_read_folio_zeromap(struct folio *folio)
> > > > > +{
> > > > > + unsigned int idx = swap_zeromap_folio_test(folio);
> > > > > +
> > > > > + if (idx == 0)
> > > > > + return false;
> > > > > +
> > > > > + /*
> > > > > + * Swapping in a large folio that is partially in the zeromap is not
> > > > > + * currently handled. Return true without marking the folio uptodate so
> > > > > + * that an IO error is emitted (e.g. do_swap_page() will sigbus).
> > > > > + */
> > > > > + if (WARN_ON_ONCE(idx < folio_nr_pages(folio)))
> > > > > + return true;
> > > >
> > > > Hi Usama, Yosry,
> > > >
> > > > I feel the warning is wrong as we could have the case where idx==0
> > > > is not zeromap but idx=1 is zeromap. idx == 0 doesn't necessarily
> > > > mean we should return false.
> > >
> > > Good catch. Yeah if idx == 0 is not in the zeromap but other indices
> > > are we will mistakenly read the entire folio from swap.
> > >
> > > >
> > > > What about the below change which both fixes the warning and unblocks
> > > > large folios swap-in?
> > >
> > > But I don't see how that unblocks the large folios swap-in work? We
> > > still need to actually handle the case where a large folio being
> > > swapped in is partially in the zeromap. Right now we warn and unlock
> > > the folio without calling folio_mark_uptodate(), which emits an IO
> > > error.
> >
> > I placed this in mm/swap.h so that during swap-in, it can filter out any large
> > folios where swap_zeromap_entries_count() is greater than 0 and less than
> > nr.
> >
> > I believe this case would be quite rare, as it can only occur when some small
> > folios that are swapped out happen to have contiguous and aligned swap
> > slots.
>
> I am assuming this would be near where the zswap_never_enabled() check
> is today, right?
The code is close to the area, but it doesn't rely on zeromap being
disabled.
>
> I understand the point of doing this to unblock the synchronous large
> folio swapin support work, but at some point we're gonna have to
> actually handle the cases where a large folio being swapped in is
> partially in the swap cache, zswap, the zeromap, etc.
>
> All these cases will need similar-ish handling, and I suspect we won't
> just skip swapping in large folios in all these cases.
I agree that this is definitely the goal. `swap_read_folio()` should be a
dependable API that always returns reliable data, regardless of whether
`zeromap` or `zswap` is involved. Despite these issues, mTHP swap-in shouldn't
be held back. Significant efforts are underway to support large folios in
`zswap`, and progress is being made. Not to mention we've already allowed
`zeromap` to proceed, even though it doesn't support large folios.
It's genuinely unfair to let the lack of mTHP support in `zeromap` and
`zswap` hold swap-in hostage.
Nonetheless, `zeromap` and `zswap` are distinct cases. With `zeromap`, we
permit almost all mTHP swap-ins, except for those rare situations where
small folios that were swapped out happen to have contiguous and aligned
swap slots.
swapcache is another quite different story, since our user scenarios begin from
the simplest sync io on mobile phones, we don't quite care about swapcache.
Thanks
Barry