Re: [RFC PATCH 0/2] mm: improve folio refcount scalability

Next message: Arnd Bergmann: "Re: [RFC PATCH 1/1] dmaengine: introduce dmaengine_bh_wq and bh helpers"
Previous message: Krzysztof Kozlowski: "Re: [PATCH v2 ath-current 1/2] wifi: ath11k: add usecase firmware handling based on device compatible"
In reply to: Kiryl Shutsemau: "Re: [RFC PATCH 0/2] mm: improve folio refcount scalability"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Gladyshev Ilya

Date: Tue Jan 13 2026 - 02:32:04 EST

On 1/12/2026 7:17 PM, Kiryl Shutsemau wrote:

On Mon, Jan 12, 2026 at 05:32:10PM +0300, Gladyshev Ilya wrote:

On 1/12/2026 2:49 PM, Kiryl Shutsemau wrote:

On Mon, Jan 12, 2026 at 11:30:38AM +0300, Gladyshev Ilya wrote:

Gentle ping on this proposal

I generally like the idea, but I would like to hear from folks who
actually understand serialization.

Also, do you have number for "a full CAS loop when the counter is
approaching overflow" thing?

I am not sure that overflow is a real problem because you need a very
specific race condition over a long time to achieve it...

Yes. But if the page is popular for pinning, GUP_PIN_COUNTING_BIAS can
cut the "very long time" substantially.

But as a safeguard, everything lower than 2^31 - #max concurrent
accesses (~#num cpu) should work, so let's say 2^30

What I meant is when we put a branch/loop in the hot path, your
performance numbers will likely not look as attractive. Am I wrong?

It would be under the same branch as the single CAS that already exists in this patch:

if (page_count_writable(page)) {
val = atomic_add_return(nr, &page->_refcount);
ret = !(val & PAGEREF_LOCKED_BIT);

if (unlikely(!ret)) {
atomic_cmpxchg_relaxed(&page->_refcount, val, PAGEREF_LOCKED_BIT);
/* [Proposed] if (failed && big enough) { CAS loop } */
}
}

Unless the "failed try_lock()" is the hot path somewhere[1], this added branch will be hidden under the already existing [unlikely taken] branch

[1]: Which I doubt, because failed try_lock() usually includes heavy re-lookup