Re: [PATCH 2/5] arm64/mm: Replace PXD_TABLE_BIT with PXD_TYPE_[MASK|SECT]

From: Ryan Roberts
Date: Mon Oct 14 2024 - 07:35:40 EST


On 14/10/2024 11:48, Anshuman Khandual wrote:
>
>
> On 10/9/24 18:58, Ryan Roberts wrote:
>> On 05/10/2024 13:38, Anshuman Khandual wrote:
>>> This modifies existing block mapping related helpers e.g [pmd|pud]_mkhuge()
>>> , mk_[pmd|pud]_sect_prot() and pmd_trans_huge() to use PXD_TYPE_[MASK|SECT]
>>> instead of corresponding PXD_TABLE_BIT. This also moves pmd_sect() earlier
>>> for the symbol's availability preventing a build warning.
>>>
>>> While here this also drops pmd_val() check from pmd_trans_huge() helper, as
>>> pmd_present() returning true already ensures that pmd_val() cannot be false
>>>
>>> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
>>> Cc: Will Deacon <will@xxxxxxxxxx>
>>> Cc: Ard Biesheuvel <ardb@xxxxxxxxxx>
>>> Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
>>> Cc: Mark Rutland <mark.rutland@xxxxxxx>
>>> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx>
>>> ---
>>> arch/arm64/include/asm/pgtable.h | 15 ++++++++-------
>>> 1 file changed, 8 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>> index fa4c32a9f572..45c49c5ace80 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -484,12 +484,12 @@ static inline pmd_t pte_pmd(pte_t pte)
>>>
>>> static inline pgprot_t mk_pud_sect_prot(pgprot_t prot)
>>> {
>>> - return __pgprot((pgprot_val(prot) & ~PUD_TABLE_BIT) | PUD_TYPE_SECT);
>>> + return __pgprot((pgprot_val(prot) & ~PUD_TYPE_MASK) | PUD_TYPE_SECT);
>>> }
>>>
>>> static inline pgprot_t mk_pmd_sect_prot(pgprot_t prot)
>>> {
>>> - return __pgprot((pgprot_val(prot) & ~PMD_TABLE_BIT) | PMD_TYPE_SECT);
>>> + return __pgprot((pgprot_val(prot) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT);
>>> }
>>>
>>> static inline pte_t pte_swp_mkexclusive(pte_t pte)
>>> @@ -554,10 +554,13 @@ static inline int pmd_protnone(pmd_t pmd)
>>> * THP definitions.
>>> */
>>>
>>> +#define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
>>> + PMD_TYPE_SECT)
>>> +
>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> static inline int pmd_trans_huge(pmd_t pmd)
>>> {
>>> - return pmd_val(pmd) && pmd_present(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT);
>>> + return pmd_present(pmd) && pmd_sect(pmd);
>>
>> Bug? Prevously we would have returned true for a "present-invalid" PMD block
>> mapping - that's one which is formatted as a PMD block mapping except the
>> PTE_VALID bit is clear and PTE_PRESENT_INVALID is set. But now, due to
>> pmd_sect() testing VALID is set (via PMD_TYPE_SECT), we no longer return true in
>> this case.
>
> Agreed, that will be problematic but the situation can be rectified by decoupling
> pmd_present_invalid() from pte_present_invalid() by checking for both last bits
> instead of just the valid bit against PTE_PRESENT_INVALID.
>
> #define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
> PMD_TYPE_SECT)

I know this is pre-existing, but the fact that this depends on PMD_VALID being
set feels like something waiting to bite us. From the SW's PoV, we should get
the same answer regardless of whether PMD_VALID xor PTE_PRESENT_INVALID is set.
I know there is nobody depending on that right now, but it feels like a bug
waiting to happen. I'm not sure how you would fix that without having the SW
explcitly know about the table bit's existance though.

>
> #define pmd_present_invalid(pmd) \
> ((pmd_val(pmd) & (PMD_TYPE_MASK | PTE_PRESENT_INVALID)) == PTE_PRESENT_INVALID)

I read this as "if the type field is 0 and PTE_PRESENT_INVALID is 1 then it's
present-invalid". That doesn't really feel any better to me than the code
knowing there is a table bit. What's the benefit of doing this vs what the code
already does? It all feels quite hacky to me.

>
> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> static inline int pmd_trans_huge(pmd_t pmd)
> {
> return pmd_sect(pmd) || pmd_present_invalid(pmd);
> }
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >>
>>> }
>>> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>>
>>> @@ -586,7 +589,7 @@ static inline int pmd_trans_huge(pmd_t pmd)
>>>
>>> #define pmd_write(pmd) pte_write(pmd_pte(pmd))
>>>
>>> -#define pmd_mkhuge(pmd) (__pmd(pmd_val(pmd) & ~PMD_TABLE_BIT))
>>> +#define pmd_mkhuge(pmd) (__pmd((pmd_val(pmd) & ~PMD_TYPE_MASK) | PMD_TYPE_SECT))
>>
>> I'm not sure if this also suffers from a similar problem? Is it possible that a
>> present-invalid pmd would be passed to pmd_mkhuge()? If so, then we are now
>> incorrectly setting the PTE_VALID bit.
> pmd_mkhuge() converts a regular pmd into a huge page and on arm64
> creating a huge page also involves setting PTE_VALID. Why would a
> present-invalid pmd is passed into pmd_mkhuge() without intending
> to make a huge entry ?
>
> There just two generic use cases for pmd_mkhuge().
>
> insert_pfn_pmd
> entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
>
> set_huge_zero_folio
> entry = mk_pmd(&zero_folio->page, vma->vm_page_prot);
> entry = pmd_mkhuge(entry);
>
> As instances in mm/debug_vm_pgtable.c, pmd_mkinvalid() should be
> called on a PMD entry after pmd_mkhuge() not the other way around.

I guess it depends on your perspective. I agree there is no issue today. But
from the core-mm's PoV, a present-invalid PMD should be indistinguishable from a
present (-valid) one.


>
>>
>>>
>>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>> #define pmd_devmap(pmd) pte_devmap(pmd_pte(pmd))
>>> @@ -614,7 +617,7 @@ static inline pmd_t pmd_mkspecial(pmd_t pmd)
>>> #define pud_mkyoung(pud) pte_pud(pte_mkyoung(pud_pte(pud)))
>>> #define pud_write(pud) pte_write(pud_pte(pud))
>>>
>>> -#define pud_mkhuge(pud) (__pud(pud_val(pud) & ~PUD_TABLE_BIT))
>>> +#define pud_mkhuge(pud) (__pud((pud_val(pud) & ~PUD_TYPE_MASK) | PUD_TYPE_SECT))
>>>
>>> #define __pud_to_phys(pud) __pte_to_phys(pud_pte(pud))
>>> #define __phys_to_pud_val(phys) __phys_to_pte_val(phys)
>>> @@ -712,8 +715,6 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
>>>
>>> #define pmd_table(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
>>> PMD_TYPE_TABLE)
>>> -#define pmd_sect(pmd) ((pmd_val(pmd) & PMD_TYPE_MASK) == \
>>> - PMD_TYPE_SECT)
>>> #define pmd_leaf(pmd) (pmd_present(pmd) && !pmd_table(pmd))
>>> #define pmd_bad(pmd) (!pmd_table(pmd))
>>>
>>