Re: [PATCH 2/5] arm64/mm: Replace PXD_TABLE_BIT with PXD_TYPE_[MASK|SECT]

From: Anshuman Khandual
Date: Mon Oct 14 2024 - 06:48:15 EST




On 10/9/24 18:58, Ryan Roberts wrote:
> On 05/10/2024 13:38, Anshuman Khandual wrote:
>> This modifies existing block mapping related helpers e.g [pmd|pud]_mkhuge()
>> , mk_[pmd|pud]_sect_prot() and pmd_trans_huge() to use PXD_TYPE_[MASK|SECT]
>> instead of corresponding PXD_TABLE_BIT. This also moves pmd_sect() earlier
>> for the symbol's availability preventing a build warning.
>>
>> While here this also drops pmd_val() check from pmd_trans_huge() helper, as
>> pmd_present() returning true already ensures that pmd_val() cannot be false
>>
>> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
>> Cc: Will Deacon <will@xxxxxxxxxx>
>> Cc: Ard Biesheuvel <ardb@xxxxxxxxxx>
>> Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
>> Cc: Mark Rutland <mark.rutland@xxxxxxx>
>> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>> Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx>
>> ---
>> arch/arm64/include/asm/pgtable.h | 15 ++++++++-------
>> 1 file changed, 8 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index fa4c32a9f572..45c49c5ace80 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -484,12 +484,12 @@ static inline pmd_t pte_pmd(pte_t pte)
>>
>> static inline pgprot_t mk_pud_sect_prot(pgprot_t prot)
>> {
>> - return __pgprot((pgprot_val(prot) & ~PUD_TABLE_BIT) | PUD_TYPE_SECT);
>> + return __pgprot((pgprot_val(prot) & ~PUD_TYPE_MASK) | PUD_TYPE_SECT);
>> }
>>
>> static inline pgprot_t mk_pmd_sect_prot(pgprot_t prot)
>> {
>> - return __pgprot((pgprot_val(prot) & ~PMD_TABLE_BIT) | PMD_TYPE_SECT);
>> + return __pgprot((pgprot_val(prot) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT);
>> }
>>
>> static inline pte_t pte_swp_mkexclusive(pte_t pte)
>> @@ -554,10 +554,13 @@ static inline int pmd_protnone(pmd_t pmd)
>> * THP definitions.
>> */
>>
>> +#define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
>> + PMD_TYPE_SECT)
>> +
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> static inline int pmd_trans_huge(pmd_t pmd)
>> {
>> - return pmd_val(pmd) && pmd_present(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
>> + return pmd_present(pmd) && pmd_sect(pmd);
>
> Bug? Prevously we would have returned true for a "present-invalid" PMD block
> mapping - that's one which is formatted as a PMD block mapping except the
> PTE_VALID bit is clear and PTE_PRESENT_INVALID is set. But now, due to
> pmd_sect() testing VALID is set (via PMD_TYPE_SECT), we no longer return true in
> this case.

Agreed, that will be problematic but the situation can be rectified by decoupling
pmd_present_invalid() from pte_present_invalid() by checking for both last bits
instead of just the valid bit against PTE_PRESENT_INVALID.

#define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
PMD_TYPE_SECT)

#define pmd_present_invalid(pmd) \
((pmd_val(pmd) & (PMD_TYPE_MASK | PTE_PRESENT_INVALID)) == PTE_PRESENT_INVALID)

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static inline int pmd_trans_huge(pmd_t pmd)
{
return pmd_sect(pmd) || pmd_present_invalid(pmd);
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */

>
>> }
>> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>
>> @@ -586,7 +589,7 @@ static inline int pmd_trans_huge(pmd_t pmd)
>>
>> #define pmd_write(pmd) pte_write(pmd_pte(pmd))
>>
>> -#define pmd_mkhuge(pmd) (__pmd(pmd_val(pmd) & ~PMD_TABLE_BIT))
>> +#define pmd_mkhuge(pmd) (__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
>
> I'm not sure if this also suffers from a similar problem? Is it possible that a
> present-invalid pmd would be passed to pmd_mkhuge()? If so, then we are now
> incorrectly setting the PTE_VALID bit.
pmd_mkhuge() converts a regular pmd into a huge page and on arm64
creating a huge page also involves setting PTE_VALID. Why would a
present-invalid pmd is passed into pmd_mkhuge() without intending
to make a huge entry ?

There just two generic use cases for pmd_mkhuge().

insert_pfn_pmd
entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));

set_huge_zero_folio
entry = mk_pmd(&zero_folio->page, vma->vm_page_prot);
entry = pmd_mkhuge(entry);

As instances in mm/debug_vm_pgtable.c, pmd_mkinvalid() should be
called on a PMD entry after pmd_mkhuge() not the other way around.

>
>>
>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> #define pmd_devmap(pmd) pte_devmap(pmd_pte(pmd))
>> @@ -614,7 +617,7 @@ static inline pmd_t pmd_mkspecial(pmd_t pmd)
>> #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud)))
>> #define pud_write(pud) pte_write(pud_pte(pud))
>>
>> -#define pud_mkhuge(pud) (__pud(pud_val(pud) & ~PUD_TABLE_BIT))
>> +#define pud_mkhuge(pud) (__pud((pud_val(pud) & ~PUD_TYPE_MASK) | PUD_TYPE_SECT))
>>
>> #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud))
>> #define __phys_to_pud_val(phys) __phys_to_pte_val(phys)
>> @@ -712,8 +715,6 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>>
>> #define pmd_table(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
>> PMD_TYPE_TABLE)
>> -#define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
>> - PMD_TYPE_SECT)
>> #define pmd_leaf(pmd) (pmd_present(pmd) && !pmd_table(pmd))
>> #define pmd_bad(pmd) (!pmd_table(pmd))
>>
>