Re: [PATCH v2] mm/pagewalk: fix race between concurrent split and refault
From: David Hildenbrand (Arm)
Date: Thu Mar 26 2026 - 04:45:31 EST
On 3/26/26 01:50, Andrew Morton wrote:
> On Wed, 25 Mar 2026 10:59:16 +0100 Max Boone via B4 Relay <devnull+mboone.akamai.com@xxxxxxxxxx> wrote:
>
>> The splitting of a PUD entry in walk_pud_range() can race with
>> a concurrent thread refaulting the PUD leaf entry causing it to
>> try walking a PMD range that has disappeared.
>>
>> An example and reproduction of this is to try reading numa_maps of
>> a process while VFIO-PCI is setting up DMA (specifically the
>> vfio_pin_pages_remote call) on a large BAR for that process.
>>
>> This will trigger a kernel BUG:
>> vfio-pci 0000:03:00.0: enabling device (0000 -> 0002)
>> BUG: unable to handle page fault for address: ffffa23980000000
>> PGD 0 P4D 0
>> Oops: Oops: 0000 [#1] SMP NOPTI
>
> Thanks, updated.
>
> AI review has a couple of questions:
> https://sashiko.dev/#/patchset/20260317-pagewalk-check-pmd-refault-v1-1-f699a010f2b3%40akamai.com
>
> It flagged the same things against the v1 patch - maybe nobody checked?
>
"could a concurrent thread collapse the PUD into a huge leaf right
before pmd_offset() is called?"
No. Collapsing while holding mmap lock etc is impossible. That's what
the comment says, if there is a PUD table, the PUD table can't go away.
Not to mention that a thing like "PUD collapse" does not exist.
"Should pmd_offset() be passed the address of the snapshot (&pudval)
instead?"
No.
"Can this loop infinitely on unsplittable PUD leaves? ... For device
memory mapped as large PUD leaves, split_huge_pud() does nothing and the
entry remains a leaf."
split_huge_pud() -> __split_huge_pud() checks "pud_trans_huge()".
pud_trans_huge() is mostly just a check for "is this a pud leaf". RiscV
checks pud_leaf(), x86 just the _PAGE_PSE bit. PPC64 with radix the
_PAGE_PTE bit.
So this will match any PUD leaves, and the code will split (here clear
the PTE) them. As long as pud_trans_huge() is properly implemented by an
architecture making use of PUD mappings. PUD support is guarded by
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD. It's really just the three
architectures above that support it.
(non-present entries might not be handled properly yet, but we don't
really support non-present entries on the pud level, so not a concern)
Great waste of 15min of my time ;)
--
Cheers,
David