Re: [PATCH RFC 2/2] mm: add PMD-level huge page support for remap_pfn_range()

From: Yin Tirui
Date: Wed Sep 24 2025 - 22:17:51 EST




On 9/24/2025 6:39 AM, Matthew Wilcox wrote:
On Tue, Sep 23, 2025 at 09:31:04PM +0800, Yin Tirui wrote:
+ entry = pte_clrhuge(pfn_pte(pmd_pfn(old_pmd), pmd_pgprot(old_pmd)));

This doesn't make sense. And I'm not saying you got this wrong; I
suspect in terms of how things work today it's actually necessary.
But the way we handle this stuff is so insane.

Thank you for pointing this out and the broader context.


pte_clrhuge() should not exist. If we have a PTE, it can't have the
huge bit set, by definition (don't anybody mention hugetlbfs because
that is an entirely separate pile of broken horrors). I understand what
you're trying to do here. You want to construct a PTE that points to
the same address as the first page of the PMD and has the same
permissions. But that *should* be written as:

entry = pfn_pte(pmd_pfn(old_pmd), pmd_pgprot(old_pmd)));

right? Now, pmd_pgprot() might or might not want to return the huge bit
set. I'm not sure. Perhaps you could have a look through and figure it

I've tested this on arm64, and pmd_pgprot() does return the huge bit set, which is exactly why I added pte_clrhuge().

out. But pfn_pte() should never return a PTE with the huge bit set.
So if it is set in the pgorot on entry, it should filter it out.

There are going to be consequences to this. Maybe there's code
somewhere that relies on pfn_pte() returning a PTE with the huge bit
set. Perhaps it's hugetlbfs.

I'll try to refactor pfn_pte() and related functions to filter out the huge bit set and test its impact on hugetlbfs.


But we have to start cleaning this garbage up. I did some work with
e3981db444a0 and the commits leading up to that. See
https://lkml.kernel.org/r/20250402181709.2386022-12-willy@xxxxxxxxxxxxx

I'd like pte_clrhuge() to be deleted from x86, not added to arm and
riscv.

I completely agree with the goal of deleting pte_clrhuge() rather than expanding it. I'll study your referenced work and align my approach with your efforts.

Would you recommend I address the pfn_pte() and related function refactoring as part of this patch series, or should I submit it as a separate patch series?

--
Best regards,
Yin Tirui