Re: [PATCH] mm/memory: fix PMD/PUD checks in follow_pfnmap_start()

From: Mike Rapoport

Date: Tue Mar 24 2026 - 04:49:41 EST


On Mon, Mar 23, 2026 at 09:20:18PM +0100, David Hildenbrand (Arm) wrote:
> follow_pfnmap_start() suffers from two problems:
>
> (1) We are not re-fetching the pmd/pud after taking the PTL
>
> Therefore, we are not properly stabilizing what the lock lock actually

^ lock lock

> protects. If there is concurrent zapping, we would indicate to the
> caller that we found an entry, however, that entry might already have
> been invalidated, or contain a different PFN after taking the lock.
>
> Properly use pmdp_get() / pudp_get() after taking the lock.
>
> (2) pmd_leaf() / pud_leaf() are not well defined on non-present entries
>
> pmd_leaf()/pud_leaf() could wrongly trigger on non-present entries.
>
> There is no real guarantee that pmd_leaf()/pud_leaf() returns something
> reasonable on non-present entries. Most architectures indeed either
> perform a present check or make it work by smart use of flags.
>
> However, for example loongarch checks the _PAGE_HUGE flag in pmd_leaf(),
> and always sets the _PAGE_HUGE flag in __swp_entry_to_pmd(). Whereby
> pmd_trans_huge() explicitly checks pmd_present(), pmd_leaf() does not
> do that.
>
> Let's check pmd_present()/pud_present() before assuming "the is a
> present PMD leaf" when spotting pmd_leaf()/pud_leaf(), like other page
> table handling code that traverses user page tables does.
>
> Given that non-present PMD entries are likely rare in VM_IO|VM_PFNMAP,
> (1) is likely more relevant than (2). It is questionable how often (1)
> would actually trigger, but let's CC stable to be sure.
>
> This was found by code inspection.
>
> Fixes: 6da8e9634bb7 ("mm: new follow_pfnmap API")
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>

Acked-by: Mike Rapoport (Microsoft) <rppt@xxxxxxxxxx>

> ---
> Gave it a quick test in a VM with MM selftests etc, but I am not sure if
> I actually trigger the follow_pfnmap machinery.

Most probably not :)
KVM selftests might, didn't really dig into that. But I doubt any selftest
would trigger potential races there.

> ---
> mm/memory.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)

--
Sincerely yours,
Mike.