Re: [PATCH 1/1] mm: implement page refcount locking via dedicated bit

From: David Hildenbrand (Arm)

Date: Thu Mar 05 2026 - 03:12:21 EST


>> if (page_ref_tracepoint_active(page_ref_mod_and_test))
>> __page_ref_mod_and_test(page, -nr, ret);
>> return ret;
>> @@ -204,6 +212,9 @@ static inline int page_ref_dec_and_test(struct page *page)
>> {
>> int ret = atomic_dec_and_test(&page->_refcount);
>>
>> + if (ret)
>> + ret = !atomic_cmpxchg_relaxed(&page->_refcount, 0, PAGEREF_LOCKED_BIT);
>> +
>> if (page_ref_tracepoint_active(page_ref_mod_and_test))
>> __page_ref_mod_and_test(page, -1, ret);
>> return ret;
>> @@ -228,14 +239,23 @@ static inline int folio_ref_dec_return(struct folio *folio)
>> return page_ref_dec_return(&folio->page);
>> }
>>
>> +#define _PAGEREF_LOCKED_LIMIT ((1 << 30) | PAGEREF_LOCKED_BIT)
>> +
>> static inline bool page_ref_add_unless_zero(struct page *page, int nr)
>> {
>> bool ret = false;
>> + int val;
>>
>> rcu_read_lock();
>> /* avoid writing to the vmemmap area being remapped */
>> - if (page_count_writable(page))
>> - ret = atomic_add_unless(&page->_refcount, nr, 0);
>> + if (page_count_writable(page)) {
>> + val = atomic_add_return(nr, &page->_refcount);
>> + ret = !(val & PAGEREF_LOCKED_BIT);
>> +
>> + /* Undo atomic_add() if counter is locked and scary big */
>> + while (unlikely((unsigned int)val >= _PAGEREF_LOCKED_LIMIT))
>> + val = atomic_cmpxchg_relaxed(&page->_refcount, val, PAGEREF_LOCKED_BIT);
It's still early here, but I think there is a problem.

Please bear with me :)

val = atomic_add_return(nr, &page->_refcount);
ret = !(val & PAGEREF_LOCKED_BIT);

Implies that can grab a reference whenever the locked-bit is not set.

Including when the refcount is 0.

Now, that works fine when racing with concurrent freeing, where we are
just able to decrement the refcount, but yet have to set the
PAGEREF_LOCKED_BIT bit.

But, what about any pages that don't have the PAGEREF_LOCKED_BIT set,
but have the refcount at 0 permanently?

That's, for example, the case for any pages where we do an explicit
set_page_count(page, 0);

For example, all pages we add to the page allocator through
__free_pages_core().

That means, that someone could easily grab a reference to such pages,
including tail pages of allocated compound pages where the refcount is
still 0 -- or pages allocated with a frozen refcount where we don't ever
do the set_page_refcount(1) in the buddy.

Bad things will happen when that wrongly page_ref_add_unless_zero()
obtained reference is dropped again to free that page.


You'd have to make sure that there is no way we can achieve refcount ==
0 without going through page_ref_dec_and_test(), when actually freeing a
page.

One piece of the puzzle is handling set_page_count(p, 0) I think. But I
suspect that there might be other places where we don't even have the
set_page_count().

See vmemmap_get_tail() in
https://lore.kernel.org/r/20260227194302.274384-13-kas@xxxxxxxxxx for
example, where we know the refcount is 0, because we allocated the page
holding memmap with __GFP_ZERO.

For example, I think you'd have to make sure that *any* pages in the
buddy have their refcount set to PAGEREF_LOCKED_BIT, not 0.

So unless I am missing soemthing, this is broken an requires a lot of
care to make sure that refcount==0 is handled everywhere accordingly.

--
Cheers,

David