Re: [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()

From: Lorenzo Stoakes (Oracle)

Date: Tue Mar 24 2026 - 09:14:43 EST


On Tue, Mar 24, 2026 at 01:46:20PM +0100, David Hildenbrand (Arm) wrote:
> On 3/24/26 12:04, Lorenzo Stoakes (Oracle) wrote:
> > On Mon, Mar 23, 2026 at 09:20:18PM +0100, David Hildenbrand (Arm) wrote:
> >> follow_pfnmap_start() suffers from two problems:
> >>
> >> (1) We are not re-fetching the pmd/pud after taking the PTL
> >>
> >> Therefore, we are not properly stabilizing what the lock lock actually
> >> protects. If there is concurrent zapping, we would indicate to the
> >> caller that we found an entry, however, that entry might already have
> >> been invalidated, or contain a different PFN after taking the lock.
> >>
> >> Properly use pmdp_get() / pudp_get() after taking the lock.
> >>
> >> (2) pmd_leaf() / pud_leaf() are not well defined on non-present entries
> >>
> >> pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.
> >>
> >> There is no real guarantee that pmd_leaf()/pud_leaf() returns something
> >> reasonable on non-present entries. Most architectures indeed either
> >> perform a present check or make it work by smart use of flags.
> >
> > It seems huge page split is the main user via pmd_invalidate() ->
> > pmd_mkinvalid().
> >
> > And I guess this is the kind of thing you mean by smart use of flags, for
> > x86-64:
>
> Exactly.
>
> [...]
>
> >
> >>
> >> However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
> >> and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
> >> pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
> >> do that.
> >
> > But pmd_present() checks for _PAGE_HUGE in pmd_present(), and if set checks
> > whether one of _PAGE_PRESENT, _PAGE_PROTNONE, _PAGE_PRESENT_INVALID is set,
> > and pmd_mkinvalid() sets _PAGE_PRESENT_INVALID (clearing _PAGE_PRESENT,
> > _VALID, _DIRTY, _PROTNONE) so it'd return true.
>
> pmd_present() will correctly indicate "not present" for, say, a softleaf
> migration entry.
>
> However, pmd_leaf() will indicate "leaf" for a softleaf migration entry.

Right yeah that's true. By definition softleaves are non-present. But as they
are leaves, you'd expect pXX_leaf() to return true.

>
> So not checking pmd_present() will actually treat non-present migration
> entries as present leafs in this function, which is wrong in the context
> of this function.
>
> We're walking present entries where things like pmd_pfn(pmd) etc make sense.

Ack, makes sense, thanks!

>
> >
> > pmd_leaf() simply checks to see if _PAGE_HUGE is set which should be
> > retained on split so should all still have worked?
> >
> > But anyway this is still worthwhile I think.
> >
> >>
> >> Let's check pmd_present()/pud_present() before assuming "the is a
> >> present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
> >> table handling code that traverses user page tables does.
> >>
> >> Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
> >> (1) is likely more relevant than (2). It is questionable how often (1)
> >> would actually trigger, but let's CC stable to be sure.
> >>
> >> This was found by code inspection.
> >>
> >> Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
> >> Cc: stable@xxxxxxxxxxxxxxx
> >> Signed-off-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>
> >
> > This looks correct to me, so:
> >
> > Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@xxxxxxxxxx>
>
> Thanks!
>
> >
> >> ---
> >> Gave it a quick test in a VM with MM selftests etc, but I am not sure if
> >> I actually trigger the follow_pfnmap machinery.
> >> ---
> >> mm/memory.c | 18 +++++++++++++++---
> >> 1 file changed, 15 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index 219b9bf6cae0..2921d35c50ae 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -6868,11 +6868,16 @@ int follow_pfnmap_start(struct follow_pfnmap_args *args)
> >>
> >> pudp = pud_offset(p4dp, address);
> >> pud = pudp_get(pudp);
> >> - if (pud_none(pud))
> >> + if (!pud_present(pud))
> >> goto out;
> >> if (pud_leaf(pud)) {
> >> lock = pud_lock(mm, pudp);
> >> - if (!unlikely(pud_leaf(pud))) {
> >> + pud = pudp_get(pudp);
> >> +
> >> + if (unlikely(!pud_present(pud))) {
> >> + spin_unlock(lock);
> >> + goto out;
> >> + } else if (unlikely(!pud_leaf(pud))) {
> >
> > Tiny nit, but no need for else here. Sometimes compilers complain about
> > this but not sure if it such pedantry is enabled in default kernel compiler
> > flags :)
>
> You mean
>
> if (unlikely(!pud_present(pud))) {
> spin_unlock(lock);
> goto out;
> }
> if (...) {
>
> ?
>
> That just creates an additional LOC without any benefit IMHO. And we use
> it all over the place :)

Yeah I think the argument is you don't want to imply that it could somehow _not_
be else. But I think it's the compiler being a wee bit pendatic... :)

>
> In fact, I will beat any C compiler with the C standard that complains
> about that ;)

Haha, I'd like to see that!

>
> --
> Cheers,
>
> David

Cheers, Lorenzo