Re: [PATCH] zswap: do not crash the kernel on decompression failure

From: Johannes Weiner
Date: Tue Feb 25 2025 - 22:57:46 EST


On Wed, Feb 26, 2025 at 02:45:41AM +0000, Yosry Ahmed wrote:
> On Tue, Feb 25, 2025 at 07:51:49PM -0500, Johannes Weiner wrote:
> > On Tue, Feb 25, 2025 at 01:32:00PM -0800, Nhat Pham wrote:
> > > + }
> > > mutex_unlock(&acomp_ctx->mutex);
> > >
> > > if (src != acomp_ctx->buffer)
> > > zpool_unmap_handle(zpool, entry->handle);
> > > + return ret;
> > > }
> > >
> > > /*********************************
> > > @@ -1018,6 +1028,7 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
> > > struct writeback_control wbc = {
> > > .sync_mode = WB_SYNC_NONE,
> > > };
> > > + int ret = 0;
> > >
> > > /* try to allocate swap cache folio */
> > > mpol = get_task_policy(current);
> > > @@ -1034,8 +1045,8 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
> > > * and freed when invalidated by the concurrent shrinker anyway.
> > > */
> > > if (!folio_was_allocated) {
> > > - folio_put(folio);
> > > - return -EEXIST;
> > > + ret = -EEXIST;
> > > + goto put_folio;
> > > }
> > >
> > > /*
> > > @@ -1048,14 +1059,17 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
> > > * be dereferenced.
> > > */
> > > tree = swap_zswap_tree(swpentry);
> > > - if (entry != xa_cmpxchg(tree, offset, entry, NULL, GFP_KERNEL)) {
> > > - delete_from_swap_cache(folio);
> > > - folio_unlock(folio);
> > > - folio_put(folio);
> > > - return -ENOMEM;
> > > + if (entry != xa_load(tree, offset)) {
> > > + ret = -ENOMEM;
> > > + goto fail;
> > > }
> > >
> > > - zswap_decompress(entry, folio);
> > > + if (!zswap_decompress(entry, folio)) {
> > > + ret = -EIO;
> > > + goto fail;
> > > + }
> > > +
> > > + xa_erase(tree, offset);
> > >
> > > count_vm_event(ZSWPWB);
> > > if (entry->objcg)
> > > @@ -1071,9 +1085,14 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
> > >
> > > /* start writeback */
> > > __swap_writepage(folio, &wbc);
> > > - folio_put(folio);
> > > + goto put_folio;
> > >
> > > - return 0;
> > > +fail:
> > > + delete_from_swap_cache(folio);
> > > + folio_unlock(folio);
> > > +put_folio:
> > > + folio_put(folio);
> > > + return ret;
> > > }
> >
> > Nice, yeah it's time for factoring out the error unwinding. If you
> > write it like this, you can save a jump in the main sequence:
> >
> > __swap_writepage(folio, &wbc);
> > ret = 0;
> > put:
> > folio_put(folio);
> > return ret;
> > delete_unlock:
>
> (I like how you sneaked the label rename in here, I didn't like 'fail'
> either :P)
>
> > delete_from_swap_cache(folio);
> > folio_unlock(folio);
> > goto put;
>
> I would go even further and avoid gotos completely (and make it super
> clear what gets executed in the normal path vs the failure path):
>
> __swap_writepage(folio, &wbc);
> folio_put(folio);
> if (ret) {
> delete_from_swap_cache(folio);
> folio_unlock(folio);
> }
> return ret;

The !folio_was_allocated case only needs the put. I guess that could
stay open-coded.

And I think you still need one goto for the other two error legs to
jump past the __swap_writepage.

> > Something like this?
> >
> > if (!zswap_decompress(entry, folio)) {
> > /*
> > * The zswap_load() return value doesn't indicate success or
> > * failure, but whether zswap owns the swapped out contents.
> > * This MUST return true here, otherwise swap_readpage() will
> > * read garbage from the backend.
> > *
> > * Success is signaled by marking the folio uptodate.
> > */
>
> We use the same trick in the folio_test_large() branch, so maybe this
> should be moved to above the function definition. Then we can perhaps
> refer to it in places where we return true wihout setting uptodate for
> added clarity if needed.

That makes sense to me. Nhat, what do you think?