Re: [PATCH] mm: zswap: fix data loss on SWP_SYNCHRONOUS_IO devices
From: Yosry Ahmed
Date: Mon Mar 25 2024 - 14:50:16 EST
On Mon, Mar 25, 2024 at 9:30 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> On Sun, Mar 24, 2024 at 02:22:46PM -0700, Yosry Ahmed wrote:
> > On Sun, Mar 24, 2024 at 2:04 PM Johannes Weiner <hannes@cmpxchgorg> wrote:
> > >
> > > Zhongkun He reports data corruption when combining zswap with zram.
> > >
> > > The issue is the exclusive loads we're doing in zswap. They assume
> > > that all reads are going into the swapcache, which can assume
> > > authoritative ownership of the data and so the zswap copy can go.
> > >
> > > However, zram files are marked SWP_SYNCHRONOUS_IO, and faults will try
> > > to bypass the swapcache. This results in an optimistic read of the
> > > swap data into a page that will be dismissed if the fault fails due to
> > > races. In this case, zswap mustn't drop its authoritative copy.
> > >
> > > Link: https://lore.kernel.org/all/CACSyD1N+dUvsu8=zV9P691B9bVq33erwOXNTmEaUbi9DrDeJzw@xxxxxxxxxxxxxx/
> > > Reported-by: Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx>
> > > Fixes: b9c91c43412f ("mm: zswap: support exclusive loads")
> > > Cc: stable@xxxxxxxxxxxxxxx [6.5+]
> > > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx>
> > > Tested-by: Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx>
> >
> > Do we also want to mention somewhere (commit log or comment) that
> > keeping the entry in the tree is fine because we are still protected
> > from concurrent loads/invalidations/writeback by swapcache_prepare()
> > setting SWAP_HAS_CACHE or so?
>
> I don't think it's necessary, as zswap isn't doing anything special
> here. It's up to the caller to follow the generic swap exclusion
> protocol that zswap also adheres to. So IMO the relevant comment
> should be, and is, above that swapcache_prepare() in do_swap_page().
>From the perspective of someone looking at the zswap code, it isn't
immediately clear what protects the zswap entry in the non-exclusive
load case from being freed from under us. At some point we had a
refcount, then we used to remove it from the tree under lock so others
wouldn't have access to it. Now it's less clear because we rely on
protection outside of zswap code.
We also document other places where we rely on the swapcache for
synchronization, so I think it may be worth briefly mentioning this
here as well, especially that in this code we explicitly check for the
folio not being in the swapcache. That said, I don't feel strongly
about it. Tracking down the SWP_SYNCHRONOUS_IO code should eventually
make it clear. Also, the commit log will end up having a link to this
thread anyway so the details are not completely unfindable :)