Re: [PATCH v2 2/4] mm/huge_memory: replace can_split_folio() with direct refcount calculation

From: David Hildenbrand (Red Hat)
Date: Mon Nov 24 2025 - 14:23:10 EST


On 11/24/25 18:05, Zi Yan wrote:
On 24 Nov 2025, at 5:41, David Hildenbrand (Red Hat) wrote:

On 11/22/25 03:55, Zi Yan wrote:
can_split_folio() is just a refcount comparison, making sure only the
split caller holds an extra pin. Open code it with
folio_expected_ref_count() != folio_ref_count() - 1. For the extra_pins
used by folio_ref_freeze(), add folio_cache_references() to calculate it.

Suggested-by: David Hildenbrand (Red Hat) <david@xxxxxxxxxx>
Signed-off-by: Zi Yan <ziy@xxxxxxxxxx>
---
include/linux/huge_mm.h | 1 -
mm/huge_memory.c | 43 ++++++++++++++++-------------------------
mm/vmscan.c | 3 ++-
3 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 97686fb46e30..1ecaeccf39c9 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -369,7 +369,6 @@ enum split_type {
SPLIT_TYPE_NON_UNIFORM,
};
-bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins);
int __split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
unsigned int new_order);
int folio_split_unmapped(struct folio *folio, unsigned int new_order);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c1f1055165dd..6c821c1c0ac3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3455,23 +3455,6 @@ static void lru_add_split_folio(struct folio *folio, struct folio *new_folio,
}
}
-/* Racy check whether the huge page can be split */
-bool can_split_folio(struct folio *folio, int caller_pins, int *pextra_pins)
-{
- int extra_pins;
-
- /* Additional pins from page cache */
- if (folio_test_anon(folio))
- extra_pins = folio_test_swapcache(folio) ?
- folio_nr_pages(folio) : 0;
- else
- extra_pins = folio_nr_pages(folio);
- if (pextra_pins)
- *pextra_pins = extra_pins;
- return folio_mapcount(folio) == folio_ref_count(folio) - extra_pins -
- caller_pins;
-}
-
static bool page_range_has_hwpoisoned(struct page *page, long nr_pages)
{
for (; nr_pages; page++, nr_pages--)
@@ -3776,17 +3759,26 @@ int folio_check_splittable(struct folio *folio, unsigned int new_order,
return 0;
}
+/* Number of folio references from the pagecache or the swapcache. */
+static unsigned int folio_cache_references(const struct folio *folio)
+{
+ if (folio_test_anon(folio) && !folio_test_swapcache(folio))
+ return 0;
+ return folio_nr_pages(folio);
+}
+
static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int new_order,
struct page *split_at, struct xa_state *xas,
struct address_space *mapping, bool do_lru,
struct list_head *list, enum split_type split_type,
- pgoff_t end, int *nr_shmem_dropped, int extra_pins)
+ pgoff_t end, int *nr_shmem_dropped)
{
struct folio *end_folio = folio_next(folio);
struct folio *new_folio, *next;
int old_order = folio_order(folio);
int ret = 0;
struct deferred_split *ds_queue;
+ int extra_pins = folio_cache_references(folio);

Can we just inline the call do folio_cache_references() and get rid of extra_pins.
(which is a bad name either way)


if (folio_ref_freeze(folio, folio_cache_references(folio) + 1) {


BTW, now that we have this helper, I wonder if we should then also do for
clarification on the unfreeze path:

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0acdc2f26ee0c..7cbcf61b7971d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -3824,8 +3824,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
zone_device_private_split_cb(folio, new_folio);
- expected_refs = folio_expected_ref_count(new_folio) + 1;
- folio_ref_unfreeze(new_folio, expected_refs);
+ folio_ref_unfreeze(new_folio, folio_cache_references(new_folio) + 1);
if (do_lru)
lru_add_split_folio(folio, new_folio, lruvec, list);
@@ -3868,8 +3867,7 @@ static int __folio_freeze_and_split_unmapped(struct folio *folio, unsigned int n
* Otherwise, a parallel folio_try_get() can grab @folio
* and its caller can see stale page cache entries.
*/
- expected_refs = folio_expected_ref_count(folio) + 1;
- folio_ref_unfreeze(folio, expected_refs);
+ folio_ref_unfreeze(folio, folio_cache_references(folio) + 1);
if (do_lru)
unlock_page_lruvec(lruvec);



Both make sense to me. Will make the change.

By comparing folio_cache_references() with folio_expected_ref_count(),
one difference is that folio_expected_ref_count() does not give right
refcount for shmem in swapcache.

Good point. Likely nobody runs into that right now because nobody can really does anything with these folios before they were re-added to the pagecache or mapped into page tables.


This is the folio_expected_ref_count() code:

if (folio_test_anon(folio)) {
/* One reference per page from the swapcache. */
ref_count += folio_test_swapcache(folio) << order;
} else {
/* One reference per page from the pagecache. */
ref_count += !!folio->mapping << order;
/* One reference from PG_private. */
ref_count += folio_test_private(folio);
}

shmem in swapcache mean !folio_test_anon(folio) && folio_test_swapcache(folio).

See below, it's actually

folio_test_anon(folio) && folio_test_swapbacked(folio)&& folio_test_swapcache(folio)

I think ...

The above code gives 0, but folio_cache_references() gives folio_nr_pages(folio).
It should not cause any issue, since IIUC shmem in swapcache happens
when the folio has an additional ref,
folio_expected_ref_count() != folio_ref_count() anyway. For split, it is
not supported yet,

Right.

so folio_expected_ref_count() in split code does not
affect shmem in swapcache. But folio_expected_ref_count() should be
fixed, right?

We should better handle it, agreed.

Staring at the history of folio_expected_ref_count() once again, back when we had folio_expected_refs() in migration code we didn't seem to handle it I think.

-static int folio_expected_refs(struct address_space *mapping,
- struct folio *folio)
-{
- int refs = 1;
- if (!mapping)
- return refs;
-
- refs += folio_nr_pages(folio);
- if (folio_test_private(folio))
- refs++;
-
- return refs;
-}


gup.c doesn't care, because the pages are still mapped.

khugepaged.c similarly.

memfd.c doesn't care because the pages are still in the pagecache.

So I suspect nothing is broken, but the migration case needs a second look.


Like:

if (folio_test_anon(folio)) {
/* One reference per page from the swapcache. */
ref_count += folio_test_swapcache(folio) << order;
} else {
/* One reference per page from shmem in the swapcache. */
ref_count += folio_test_swapcache(folio) << order;
/* One reference per page from the pagecache. */
ref_count += !!folio->mapping << order;
/* One reference from PG_private. */
ref_count += folio_test_private(folio);
}

or simplified into

if (!folio_test_anon(folio)) {
/* One reference per page from the pagecache. */
ref_count += !!folio->mapping << order;
/* One reference from PG_private. */
ref_count += folio_test_private(folio);
}
/* One reference per page from the swapcache (anon or shmem). */
ref_count += folio_test_swapcache(folio) << order;
?

That is incorrect I think due to swapcache being able to give false positives (PG_owner_priv_1).

--
Cheers

David