Re: [PATCH] mm/zsmalloc: fix NULL pointer dereference in get_next_zpdesc

From: Sergey Senozhatsky

Date: Wed Feb 18 2026 - 00:56:53 EST

* Recovered from SPAM folder *

On (26/02/09 14:50), Joshua Hahn wrote:
> On Mon, 9 Feb 2026 19:32:57 +0000 Michael Fara <mjfara@xxxxxxxxx> wrote:
>
> Hello Michael,
>
> I hope you are doing well! Thank you for the patch. I'm not entirely sure if
> the race condition that you note here is correct, and also if this is the
> right fix.

Thanks for taking a look!

> > get_next_zpdesc() calls get_zspage() which unconditionally dereferences
> > zpdesc->zspage without a NULL check. This causes a kernel oops when
> > zpdesc->zspage has been set to NULL by reset_zpdesc() during a race
> > between zspage destruction and page compaction/migration.
> >
> > The race window is documented in a TODO comment in zs_page_migrate():
> >
> > "nothing prevents a zspage from getting destroyed while it is
> > isolated for migration, as the page lock is temporarily dropped
> > after zs_page_isolate() succeeded"
>
> I'm taking a look at zsmalloc these days, and I was confused by this comment
> and remember looking into what was going on here as well : -) My thoughts
> were that it should be safe in every other case, though.
>
> > The sequence is:
> > 1. Compaction calls zs_page_isolate() on a zpdesc, then drops its
> > page lock.
> > 2. Concurrently, async_free_zspage() or free_zspage() destroys the
> > zspage, calling reset_zpdesc() which sets zpdesc->zspage = NULL.
>
> async_free_zspage isolates the zspage first by taking the class lock and
> then splicing the zspage out. free_zspage is always called with the
> class lock held, and it likewise removes itself from the fullness list.
>
> zs_free -> free_zspage -> trylock_zspage holds the class->lock
> In zs_page_migrate, we call replace_sub_page to remove the zpdesc from the
> chain before calling reset_zpdesc, so I don't think there would be a race there
> either.

Right, am having same thoughts. I need to look closer, but sort
of expect that class->lock should have kept us safe here.

> > 3. A subsequent zs_free() path calls trylock_zspage(), which iterates
> > zpdescs via get_next_zpdesc(). get_zspage() dereferences the now-
> > NULL backpointer, causing:
>
> So I don't see how a subsequent zs_free could race with reset_zpdesc on
> the same zpdesc, maybe I'm missing something? : -)
>
> > BUG: kernel NULL pointer dereference, address: 0000000000000000
> > RIP: 0010:free_zspage+0x26/0x100
> > Call Trace:
> > zs_free+0xf4/0x110
> > zswap_entry_free+0x7e/0x160
>
> Is this from a real crash? A more detailed crash dump would be helpful to
> see what else was happening on the other threads!
>
> > The migration side already has a NULL guard (zs_page_migrate line 1675:
> > "if (!zpdesc->zspage) return 0;"), but get_next_zpdesc() lacks the same
> > protection.
> >
> > Fix this by reading zpdesc->zspage directly in get_next_zpdesc()
> > instead of going through get_zspage(), and returning NULL when the
> > backpointer is NULL. This stops iteration safely — the caller treats
> > it as the end of the page chain.
>
> If we return NULL early, what happens to the remaining zpdescs in the
> zspage? In trylock_zspage for instance, we might exit early and think we
> successfully locked all zpdescs, when that isn't the case. It would eventually
> lead to a VM_BUG_ON_PAGE when trying to free the zspage later anyways.

Agreed.