[PATCH v4] mm: assert exclusive nid/zonenum bits at the page/folio access sites

From: Hui Zhu

Date: Thu Jun 25 2026 - 01:40:41 EST


From: Hui Zhu <zhuhui@xxxxxxxxxx>

KCSAN reports a data race between page_to_nid()/folio_pgdat() reading
page->flags and folio_trylock()/folio_lock() concurrently doing
test_and_set_bit_lock(PG_locked, ...) on the same word, e.g.:

BUG: KCSAN: data-race in __lruvec_stat_mod_folio / shmem_get_folio_gfp

The node id and zone id occupy fixed bit-ranges of page->flags that
are set once at page init and never modified afterwards, so they can
never overlap with the low PG_locked/PG_waiters bits touched by the
folio lock path.

ASSERT_EXCLUSIVE_BITS(mdf.f, ...) inside memdesc_nid()/memdesc_zonenum()
checks a by-value copy of the flags word, not the actual shared
page->flags/folio->flags being modified concurrently, so it doesn't
reliably assert anything about the real race. Move the assertion to
page_to_nid(), folio_nid(), page_zonenum() and folio_zonenum(), where
flags is dereferenced directly from the page/folio.

On CONFIG_NUMA=n, NODES_MASK is 0 and the old memdesc_nid() body
folded to a constant, so page->flags/folio->flags was never actually
read. ASSERT_EXCLUSIVE_BITS() is a real runtime check that can't be
folded away, so doing it unconditionally would add a pointless read
of page->flags/folio->flags and a check that can never fire. Keep
page_to_nid()/folio_nid() as plain "return 0" static inline stubs
under CONFIG_NUMA=n instead.

Signed-off-by: Hui Zhu <zhuhui@xxxxxxxxxx>
---
Changelog:
v4:
According to the comments of Andrew and Sashiko, set
page_to_nid()/folio_nid() as static inline stubs returning 0
under CONFIG_NUMA=n.
v3:
According to the comments of Andrew and Sashiko, move
ASSERT_EXCLUSIVE_BITS out of memdesc_nid()/memdesc_zonenum()
into the page/folio call sites.
v2:
According to the comments of David, remove useless comments and use
ASSERT_EXCLUSIVE_BITS() in memdesc_nid() instead of data_race() in
page_to_nid().

include/linux/mm.h | 9 +++++++++
include/linux/mmzone.h | 3 ++-
2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 485df9c2dbdd..56b39194605a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2294,15 +2294,24 @@ static inline int memdesc_nid(memdesc_flags_t mdf)
}
#endif

+#ifdef CONFIG_NUMA
static inline int page_to_nid(const struct page *page)
{
+ ASSERT_EXCLUSIVE_BITS(PF_POISONED_CHECK(page)->flags,
+ NODES_MASK << NODES_PGSHIFT);
return memdesc_nid(PF_POISONED_CHECK(page)->flags);
}

static inline int folio_nid(const struct folio *folio)
{
+ ASSERT_EXCLUSIVE_BITS(folio->flags,
+ NODES_MASK << NODES_PGSHIFT);
return memdesc_nid(folio->flags);
}
+#else
+#define page_to_nid(page) (0)
+#define folio_nid(folio) (0)
+#endif

#ifdef CONFIG_NUMA_BALANCING
/* page access time bits needs to hold at least 4 seconds */
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ca2712187147..56dffa966343 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -1274,17 +1274,18 @@ static inline bool zone_is_empty(const struct zone *zone)

static inline enum zone_type memdesc_zonenum(memdesc_flags_t flags)
{
- ASSERT_EXCLUSIVE_BITS(flags.f, ZONES_MASK << ZONES_PGSHIFT);
return (flags.f >> ZONES_PGSHIFT) & ZONES_MASK;
}

static inline enum zone_type page_zonenum(const struct page *page)
{
+ ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT);
return memdesc_zonenum(page->flags);
}

static inline enum zone_type folio_zonenum(const struct folio *folio)
{
+ ASSERT_EXCLUSIVE_BITS(folio->flags, ZONES_MASK << ZONES_PGSHIFT);
return memdesc_zonenum(folio->flags);
}

--
2.43.0