Re: [PATCH v4] mm: assert exclusive nid/zonenum bits at the page/folio access sites
From: David Hildenbrand (Arm)
Date: Thu Jun 25 2026 - 02:46:30 EST
On 6/25/26 07:39, Hui Zhu wrote:
> From: Hui Zhu <zhuhui@xxxxxxxxxx>
>
> KCSAN reports a data race between page_to_nid()/folio_pgdat() reading
> page->flags and folio_trylock()/folio_lock() concurrently doing
> test_and_set_bit_lock(PG_locked, ...) on the same word, e.g.:
>
> BUG: KCSAN: data-race in __lruvec_stat_mod_folio / shmem_get_folio_gfp
>
> The node id and zone id occupy fixed bit-ranges of page->flags that
> are set once at page init and never modified afterwards, so they can
> never overlap with the low PG_locked/PG_waiters bits touched by the
> folio lock path.
>
> ASSERT_EXCLUSIVE_BITS(mdf.f, ...) inside memdesc_nid()/memdesc_zonenum()
> checks a by-value copy of the flags word, not the actual shared
> page->flags/folio->flags being modified concurrently, so it doesn't
> reliably assert anything about the real race.
Is that the case? I thought the existing ASSERT_EXCLUSIVE_BITS() reliably worked
before?
Maybe the compiler optimizing out a local copy sorted that for us.
> Move the assertion to
> page_to_nid(), folio_nid(), page_zonenum() and folio_zonenum(), where
> flags is dereferenced directly from the page/folio.
>
> On CONFIG_NUMA=n, NODES_MASK is 0 and the old memdesc_nid() body
> folded to a constant, so page->flags/folio->flags was never actually
> read. ASSERT_EXCLUSIVE_BITS() is a real runtime check that can't be
> folded away, so doing it unconditionally would add a pointless read
> of page->flags/folio->flags and a check that can never fire. Keep
> page_to_nid()/folio_nid() as plain "return 0" static inline stubs
> under CONFIG_NUMA=n instead.
>
> Signed-off-by: Hui Zhu <zhuhui@xxxxxxxxxx>
> ---
> Changelog:
> v4:
> According to the comments of Andrew and Sashiko, set
> page_to_nid()/folio_nid() as static inline stubs returning 0
> under CONFIG_NUMA=n.
> v3:
> According to the comments of Andrew and Sashiko, move
> ASSERT_EXCLUSIVE_BITS out of memdesc_nid()/memdesc_zonenum()
> into the page/folio call sites.
> v2:
> According to the comments of David, remove useless comments and use
> ASSERT_EXCLUSIVE_BITS() in memdesc_nid() instead of data_race() in
> page_to_nid().
>
> include/linux/mm.h | 9 +++++++++
> include/linux/mmzone.h | 3 ++-
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 485df9c2dbdd..56b39194605a 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2294,15 +2294,24 @@ static inline int memdesc_nid(memdesc_flags_t mdf)
> }
> #endif
>
> +#ifdef CONFIG_NUMA
> static inline int page_to_nid(const struct page *page)
> {
> + ASSERT_EXCLUSIVE_BITS(PF_POISONED_CHECK(page)->flags,
> + NODES_MASK << NODES_PGSHIFT);
Performing the PF_POISONED_CHECK() twice is a bit odd. One time is sufficient,
maybe simply before both statements separately?
> return memdesc_nid(PF_POISONED_CHECK(page)->flags);
> }
>
> static inline int folio_nid(const struct folio *folio)
> {
> + ASSERT_EXCLUSIVE_BITS(folio->flags,
> + NODES_MASK << NODES_PGSHIFT);
> return memdesc_nid(folio->flags);
> }
> +#else
> +#define page_to_nid(page) (0)
> +#define folio_nid(folio) (0)
> +#endif
>
LGTM
Acked-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>
--
Cheers,
David