Re: [PATCH RFC v3 2/4] mm/pgtable: Make pfn_pte() filter out huge page attributes

From: Matthew Wilcox

Date: Thu Mar 05 2026 - 23:26:06 EST


On Thu, Mar 05, 2026 at 05:38:46PM +0800, Yin Tirui wrote:
> On 3/4/2026 3:52 PM, Jürgen Groß wrote:
> > Today it can either be used for a large page (which should be a pmd,
> > of course), or - much worse - you'd strip the _PAGE_PAT bit, which is
> > at the same position in PTEs.
> >
> > So basically you are removing the ability to use some cache modes.
> >
> > NACK!
> >
> >
> > Juergen
>
> Hi Willy and Jürgen,
>
> Following up on the x86 _PAGE_PSE and _PAGE_PAT aliasing issue.
>
> To achieve the goal of keeping pfn_pte() pure and completely eradicating the
> pte_clrhuge() anti-pattern, we need a way to ensure pfn_pte() never receives
> a pgprot with the huge bit set.
>
> @Jürgen:
> Just to be absolutely certain: is there any safe way to filter out the huge
> page attributes directly inside x86's pfn_pte() without breaking PAT? Or
> does the hardware bit-aliasing make this strictly impossible at the
> pfn_pte() level?
>
> @Willy @Jürgen:
> Assuming it is impossible to filter this safely inside pfn_pte() on x86, we
> must translate the pgprot before passing it down. To maintain strict
> type-safety and still drop pte_clrhuge(), I plan to introduce two
> arch-neutral wrappers:
>
> x86:
> /* Translates large prot to 4K. Shifts PAT back to bit 7, inherently
> clearing _PAGE_PSE */
> #define pgprot_huge_to_pte(prot) pgprot_large_2_4k(prot)
> /* Translates 4K prot to large. Shifts PAT to bit 12, strictly sets
> _PAGE_PSE */
> #define pgprot_pte_to_huge(prot)
> __pgprot(pgprot_val(pgprot_4k_2_large(prot)) | _PAGE_PSE)

I don't think we should have pgprot_large_2_4k(). Or rather, I think it
should be embedded in pmd_pgprot() / pud_pgprot(). That is, we should
have an 'ideal' pgprot which, on x86, perhaps matches that used by the
4k level. pfn_pmd() should be converting from the ideal pgprot to
that actually used by PMDs (and setting _PAGE_PSE?)

> arm64:
> /*
> * Drops Block marker, enforces Page marker.
> * Strictly preserves the PTE_VALID bit to avoid validating PROT_NONE pages.
> */
> #define pgprot_huge_to_pte(prot) \
>       __pgprot((pgprot_val(prot) & ~(PMD_TYPE_MASK & ~PTE_VALID)) | \
>              (PTE_TYPE_PAGE & ~PTE_VALID))
> /*
> * Drops Page marker, sets Block marker.
> * Strictly preserves the PTE_VALID bit.
> */
> #define pgprot_pte_to_huge(prot) \
>       __pgprot((pgprot_val(prot) & ~(PTE_TYPE_MASK & ~PTE_VALID)) | \
>              (PMD_TYPE_SECT & ~PTE_VALID))
>
> Usage:
> 1. Creating a huge pfnmap (remap_try_huge_pmd)
> pgprot_t huge_prot = pgprot_pte_to_huge(prot);
>
> /* No need for pmd_mkhuge() */
> pmd_t entry = pmd_mkspecial(pfn_pmd(pfn, huge_prot));
> set_pmd_at(mm, addr, pmd, entry);
>
> 2. Splitting a huge pfnmap (__split_huge_pmd_locked)
> pgprot_t small_prot = pgprot_huge_to_pte(pmd_pgprot(old_pmd));
>
> /* No need for pte_clrhuge() */
> pte_t entry = pfn_pte(pmd_pfn(old_pmd), small_prot);
> set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
>
>
> Willy, is there a better architectural approach to handle this and satisfy
> the type-safety requirement given the x86 hardware constraints?
>
> --
> Thanks,
> Yin Tirui
>
>