Re: [PATCH] mm: page_isolation: Avoid hugepage scan step underflow
From: David Hildenbrand (Arm)
Date: Mon Jun 01 2026 - 12:26:54 EST
On 5/22/26 11:35, Kaitao Cheng wrote:
> 在 2026/5/20 01:54, Andrew Morton 写道:
>> On Tue, 19 May 2026 20:16:46 +0800 Kaitao Cheng <kaitao.cheng@xxxxxxxxx> wrote:
>>
>>> page_is_unmovable() checks HugeTLB pages without holding hugetlb_lock and
>>> without pinning the folio. This is intentional for the pageblock scanning
>>> paths, but it means the HugeTLB folio can be freed concurrently after
>>> PageHuge() or folio_test_hugetlb() succeeds.
>>
>> Thanks. What are the userspace-visible runtime effects of this?
>
> The direct user-visible effect is not memory corruption, but degraded forward
> progress in the page isolation / contiguous allocation path.
>
> If the race makes folio_nr_pages() return 1 while the current page is still
> treated as a tail page of the old HugeTLB folio, the computed step can
> underflow:
>
> step = folio_nr_pages(folio) - folio_page_idx(folio, page);
>
> The caller then does:
>
> start_pfn += step;
>
> With unsigned arithmetic this can wrap and move start_pfn backwards, typically
> near the beginning of the old hugepage range rather than advancing past it. In
> many cases this only means rescanning part of the same hugepage range, so the
> effect may be limited to extra scanning work.
>
> However, it still violates the scanner's forward-progress assumption: step
> is expected to advance start_pfn. If the same transient state is observed
> repeatedly, the scanner can keep revisiting the same PFNs, causing excessive
> latency and, in the worst case, an apparent stall in operations that rely on
> page isolation or contiguous allocation.
>
>
> Here is another point raised by AI, the old code also used folio_test_lru()
> on a folio pointer obtained without holding a reference. If the folio is
> freed and the old head page is reused or observed as a tail page of another
> compound page, folio_test_lru() can reach const_folio_flags(), which asserts
> that the passed folio is not a tail page. On DEBUG_VM kernels, that can
> trigger a VM_BUG_ON_PGFLAGS() crash.
>
>>> The existing code avoids folio_hstate() and uses size_to_hstate() because
>>> the HugeTLB flag may already have been cleared. However, if
>>> size_to_hstate() returns NULL, the code still falls through and computes
>>> the scan step from folio_nr_pages(). If the folio has been freed and the
>>> head/large state has been cleared, folio_nr_pages() can return 1. When the
>>> current page is a tail page, subtracting folio_page_idx() from 1 can
>>> underflow and make the scanner skip too far.
>>>
>>> Treat a NULL hstate as unmovable so the scanner does not try to skip over
>>> an unstable HugeTLB folio. Once a valid hstate is found, derive the number
>>> of pages from the hstate instead of reading the folio size again. Also
>>> validate the page index before computing the step to avoid underflow if the
>>> page/folio relationship changed concurrently.
>>
>> This code sounds rather sketchy, and it sounds like it will remain
>> sketchy after the patch. And AI review says "hey, this code is
>> sketchy":
>> https://sashiko.dev/#/patchset/20260519121646.40833-1-kaitao.cheng@xxxxxxxxx
>
> Following David Hildenbrand's suggestion, I made some changes as shown below.
> I'm not sure whether there are still any other issues.
>
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -41,8 +41,14 @@ bool page_is_unmovable(struct zone *zone, struct page *page,
> * We need not scan over tail pages because we don't
> * handle each tail page individually in migration.
> */
> - if (PageHuge(page) || PageCompound(page)) {
> + if (PageCompound(page)) {
> struct folio *folio = page_folio(page);
> + unsigned long nr_pages, pfn;
> + unsigned int order;
> +
> + order = compound_order(&folio->page);
> + if (order > MAX_FOLIO_ORDER)
> + return true;
Would we also have to care about non-order-of-2?
>
> if (folio_test_hugetlb(folio)) {
> struct hstate *h;
> @@ -54,15 +60,16 @@ bool page_is_unmovable(struct zone *zone, struct page *page,
> * The huge page may be freed so can not
> * use folio_hstate() directly.
> */
> - h = size_to_hstate(folio_size(folio));
> - if (h && !hugepage_migration_supported(h))
> + h = size_to_hstate(PAGE_SIZE << order);
> + if (!h || !hugepage_migration_supported(h))
> return true;
> -
> - } else if (!folio_test_lru(folio)) {
> + } else if (!PageLRU(page)) {
Hm, is that required because we could VM_BUG_ON?
> return true;
> }
>
> - *step = folio_nr_pages(folio) - folio_page_idx(folio, page);
> + nr_pages = 1UL << order;
> + pfn = page_to_pfn(page);
> + *step = (pfn | (nr_pages - 1)) + 1 - pfn;
> return false;
> }
>
--
Cheers,
David