Re: [RFC PATCH 2/2] mm: implement page refcount locking via dedicated bit
From: Gladyshev Ilya
Date: Fri Dec 19 2025 - 14:09:02 EST
On 12/19/2025 8:46 PM, Kiryl Shutsemau wrote:
On Fri, Dec 19, 2025 at 07:18:53PM +0300, Gladyshev Ilya wrote:But you can't trust that the page is safe to use after page_ref_dec_and_test() returns false, if I understood your example correctly. For example, current implementation can also lead to this 'bug' if you slightly change the order of atomic ops in your example:
On 12/19/2025 5:50 PM, Kiryl Shutsemau wrote:
On Fri, Dec 19, 2025 at 12:46:39PM +0000, Gladyshev Ilya wrote:Ack, thanks
The current atomic-based page refcount implementation treats zero
counter as dead and requires a compare-and-swap loop in folio_try_get()
to prevent incrementing a dead refcount. This CAS loop acts as a
serialization point and can become a significant bottleneck during
high-frequency file read operations.
This patch introduces FOLIO_LOCKED_BIT to distinguish between a
s/FOLIO_LOCKED_BIT/PAGEREF_LOCKED_BIT/
(temporary) zero refcount and a locked (dead/frozen) state. Because now
incrementing counter doesn't affect it's locked/unlocked state, it is
possible to use an optimistic atomic_fetch_add() in
page_ref_add_unless_zero() that operates independently of the locked bit.
The locked state is handled after the increment attempt, eliminating the
need for the CAS loop.
I don't think I follow.
Your trick with the PAGEREF_LOCKED_BIT helps with serialization against
page_ref_freeze(), but I don't think it does anything to serialize
against freeing the page under you.
Like, if the page in the process of freeing, page allocator sets its
refcount to zero and your version of page_ref_add_unless_zero()
successfully acquirees reference for the freed page.
How is it safe?
Page is freed only after a successful page_ref_dec_and_test() call, which
will set LOCKED_BIT. This bit will persist until set_page_count(1) is called
somewhere in the allocation path [alloc_pages()], and effectively block any
"use after free" users.
Okay, fair enough.
But what prevent the following scenario?
CPU0 CPU1
page_ref_dec_and_test()
atomic_dec_and_test() // refcount=0
page_ref_add_unless_zero()
atomic_add_return() // refcount=1, no LOCKED_BIT
page_ref_dec_and_test()
atomic_dec_and_test() // refcount=0
atomic_cmpxchg(0, LOCKED_BIT) // succeeds
atomic_cmpxchg(0, LOCKED_BIT) // fails
// return false to caller
// Use-after-free: BOOM!
Initial refcount value: 1 from CPU 0
CPU 0 CPU 1
page_ref_and_dec() page_ref_add_unless_zero()
atomic_add_return() [1 -> 2]
atomic_dec_and_test() [2 -> 1]
page_ref_dec_and_test()
atomic_dec_and_test() [1 -> 0]
/* page is logically freed here */
return false [cause 1!=0]
// Caller with use after free?