Re: [PATCH v5] mm: assert exclusive nid/zonenum bits at the page/folio access sites
From: Lorenzo Stoakes
Date: Mon Jun 29 2026 - 06:30:06 EST
On Thu, Jun 25, 2026 at 02:08:39PM +0200, David Hildenbrand (Arm) wrote:
> On 6/25/26 14:07, Lorenzo Stoakes wrote:
> > On Thu, Jun 25, 2026 at 01:53:14PM +0200, David Hildenbrand (Arm) wrote:
> >> On 6/25/26 09:18, Hui Zhu wrote:
> >>> From: Hui Zhu <zhuhui@xxxxxxxxxx>
> >>>
> >>> KCSAN reports a data race between page_to_nid()/folio_pgdat() reading
> >>> page->flags and folio_trylock()/folio_lock() concurrently doing
> >>> test_and_set_bit_lock(PG_locked, ...) on the same word, e.g.:
> >>>
> >>> BUG: KCSAN: data-race in __lruvec_stat_mod_folio / shmem_get_folio_gfp
> >>>
> >>> The node id and zone id occupy fixed bit-ranges of page->flags that
> >>> are set once at page init and never modified afterwards, so they can
> >>> never overlap with the low PG_locked/PG_waiters bits touched by the
> >>> folio lock path.
> >>>
> >>> ASSERT_EXCLUSIVE_BITS(mdf.f, ...) inside memdesc_nid()/memdesc_zonenum()
> >>> checks a by-value copy of the flags word, not the actual shared
> >>> page->flags/folio->flags being modified concurrently, so it doesn't
> >>> reliably assert anything about the real race. Move the assertion to
> >>> page_to_nid(), folio_nid(), page_zonenum() and folio_zonenum(), where
> >>> flags is dereferenced directly from the page/folio.
> >>>
> >>> On CONFIG_NUMA=n, NODES_MASK is 0 and the old memdesc_nid() body
> >>> folded to a constant, so page->flags/folio->flags was never actually
> >>> read. ASSERT_EXCLUSIVE_BITS() is a real runtime check that can't be
> >>> folded away, so doing it unconditionally would add a pointless read
> >>> of page->flags/folio->flags and a check that can never fire. Keep
> >>> page_to_nid()/folio_nid() as plain "return 0" static inline stubs
> >>> under CONFIG_NUMA=n instead.
> >>>
> >>> Signed-off-by: Hui Zhu <zhuhui@xxxxxxxxxx>
> >>> Acked-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>
> >>> ---
> >>> Changelog:
> >>> v5:
> >>> According to the comments of Sashiko, guard the ASSERT_EXCLUSIVE_BITS()
> >>> calls with #ifndef NODE_NOT_IN_PAGE_FLAGS (for nid) and #if
> >>> ZONES_WIDTH != 0 (for zonenum).
> >>> According to the comments of David, avoid calling
> >>> PF_POISONED_CHECK(page) twice in page_to_nid().
> >>> According to the warning of lkp, switch the CONFIG_NUMA=n
> >>> page_to_nid()/folio_nid() stubs from macros to static inline functions.
> >>> v4:
> >>> According to the comments of Andrew and Sashiko, set
> >>> page_to_nid()/folio_nid() as static inline stubs returning 0
> >>> under CONFIG_NUMA=n.
> >>> v3:
> >>> According to the comments of Andrew and Sashiko, move
> >>> ASSERT_EXCLUSIVE_BITS out of memdesc_nid()/memdesc_zonenum()
> >>> into the page/folio call sites.
> >>> v2:
> >>> According to the comments of David, remove useless comments and use
> >>> ASSERT_EXCLUSIVE_BITS() in memdesc_nid() instead of data_race() in
> >>> page_to_nid().
> >>>
> >>> include/linux/mm.h | 23 ++++++++++++++++++++++-
> >>> include/linux/mmzone.h | 7 ++++++-
> >>> 2 files changed, 28 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/include/linux/mm.h b/include/linux/mm.h
> >>> index 485df9c2dbdd..772bd1fc6fe7 100644
> >>> --- a/include/linux/mm.h
> >>> +++ b/include/linux/mm.h
> >>> @@ -2294,15 +2294,36 @@ static inline int memdesc_nid(memdesc_flags_t mdf)
> >>> }
> >>> #endif
> >>>
> >>> +#ifdef CONFIG_NUMA
> >>> static inline int page_to_nid(const struct page *page)
> >>> {
> >>> - return memdesc_nid(PF_POISONED_CHECK(page)->flags);
> >>> + const struct page *p = PF_POISONED_CHECK(page);
> >>> +
> >>> +#ifndef NODE_NOT_IN_PAGE_FLAGS
> >>> + ASSERT_EXCLUSIVE_BITS(p->flags, NODES_MASK << NODES_PGSHIFT);
> >>> +#endif
> >>> + return memdesc_nid(p->flags);
> >>> }
> >>>
> >>> static inline int folio_nid(const struct folio *folio)
> >>> {
> >>> +#ifndef NODE_NOT_IN_PAGE_FLAGS
> >>> + ASSERT_EXCLUSIVE_BITS(folio->flags,
> >>> + NODES_MASK << NODES_PGSHIFT);
> >>> +#endif47
> >>
> >> This is getting ugly, really. We're leaking implementation details from
> >> memdesc_nid() into folio_nid().
> >>
> >> Maybe just turn memdesc_nid() into a macro where we can just do that check
> >> internally? Not the best thing in this world, but better than this here.
> >
> > Could also do:
> >
> > if (!IS_ENABLED(NODE_NOT_IN_PAGE_FLAGS))
> > ASSERT_EXCLUSIVE_BITS(folio->flags,
> > NODES_MASK << NODES_PGSHIFT);
> >
> > But not sure if it's that much better.
>
> It's still making an assumption of what the memdesc function we're calling will do.
Ack, yeah we should avoid having implicit assumptions as to use here!
>
> --
> Cheers,
>
> David
Cheers, Lorenzo