Re: [PATCH 1/5] arm64/mm: Drop pte_mkhuge()

From: Ryan Roberts
Date: Mon Oct 14 2024 - 07:12:59 EST


On 14/10/2024 09:59, Anshuman Khandual wrote:
>
>
> On 10/9/24 18:50, Ryan Roberts wrote:
>> On 05/10/2024 13:38, Anshuman Khandual wrote:
>>> Core HugeTLB defines arch_make_huge_pte() fallback definition, which calls
>>> platform provided pte_mkhuge(). But if any platform already provides custom
>>> arch_make_huge_pte(), then it does not need to provide pte_mkhuge(). arm64
>>> defines arch_make_huge_pte(), but then also calls pte_mkhuge() internally.
>>> This creates confusion as if both of these callbacks are being used in core
>>> HugeTLB and required to be defined in the platform.
>>>
>>> This changes arch_make_huge_pte() to create block mapping directly and also
>>> drops off now redundant helper pte_mkhuge(), making things clear. Also this
>>> changes HugeTLB page creation from just clearing the PTE_TABLE_BIT (bit[1])
>>> to actually setting bits[1:0] via PTE_TYPE_[MASK|SECT] instead.
>>>
>>> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
>>> Cc: Will Deacon <will@xxxxxxxxxx>
>>> Cc: Ard Biesheuvel <ardb@xxxxxxxxxx>
>>> Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
>>> Cc: Mark Rutland <mark.rutland@xxxxxxx>
>>> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx>
>>> ---
>>> arch/arm64/include/asm/pgtable-hwdef.h | 1 +
>>> arch/arm64/include/asm/pgtable.h | 5 -----
>>> arch/arm64/mm/hugetlbpage.c | 2 +-
>>> 3 files changed, 2 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
>>> index fd330c1db289..956a702cb532 100644
>>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
>>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>>> @@ -158,6 +158,7 @@
>>> #define PTE_VALID (_AT(pteval_t, 1) << 0)
>>> #define PTE_TYPE_MASK (_AT(pteval_t, 3) << 0)
>>> #define PTE_TYPE_PAGE (_AT(pteval_t, 3) << 0)
>>> +#define PTE_TYPE_SECT (_AT(pteval_t, 1) << 0)
>>> #define PTE_TABLE_BIT (_AT(pteval_t, 1) << 1)
>>> #define PTE_USER (_AT(pteval_t, 1) << 6) /* AP[1] */
>>> #define PTE_RDONLY (_AT(pteval_t, 1) << 7) /* AP[2] */
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>> index c329ea061dc9..fa4c32a9f572 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -438,11 +438,6 @@ static inline void __set_ptes(struct mm_struct *mm,
>>> }
>>> }
>>>
>>> -/*
>>> - * Huge pte definitions.
>>> - */
>>> -#define pte_mkhuge(pte) (__pte(pte_val(pte) & ~PTE_TABLE_BIT))
>>> -
>>> /*
>>> * Hugetlb definitions.
>>> */
>>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>>> index 5f1e2103888b..5922c95630ad 100644
>>> --- a/arch/arm64/mm/hugetlbpage.c
>>> +++ b/arch/arm64/mm/hugetlbpage.c
>>> @@ -361,7 +361,7 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
>>> {
>>> size_t pagesize = 1UL << shift;
>>>
>>> - entry = pte_mkhuge(entry);
>>> + entry = __pte((pte_val(entry) & ~PTE_TYPE_MASK) | PTE_TYPE_SECT);
>>
>> I think there may be an existing bug here; if pagesize == CONT_PTE_SIZE, then
>> entry will be placed in the level 3 table. In this case, shouldn't bit 1 remain
>> set, because at level 3, a page mapping is denoted by bits[1:0] = 3 ? Currently
>> its being unconditionally cleared.
>
> That's not a problem, pte_mkcont() brings back both the bits
> via PTE_TYPE_PAGE along with PTE_CONT.
>
> if (pagesize == CONT_PTE_SIZE) {
> entry = pte_mkcont(entry);
> } else if (pagesize == CONT_PMD_SIZE) {
> entry = pmd_pte(pmd_mkcont(pte_pmd(entry)));
> } else if (pagesize != PUD_SIZE && pagesize != PMD_SIZE) {
> pr_warn("%s: unrecognized huge page size 0x%lx\n",
> __func__, pagesize);
> }
>
> static inline pte_t pte_mkcont(pte_t pte)
> {
> pte = set_pte_bit(pte, __pgprot(PTE_CONT));
> return set_pte_bit(pte, __pgprot(PTE_TYPE_PAGE));

Oh wow, that's pretty hacky. Good job we never call pte_mkcont() on a
PTE_PRESENT_INVALID PTE. This would turn it valid again.

> }
>
> Although the same is not required for CONT_PMD_SIZE size huge
> pages where only PTE_CONT is enough.
>
> static inline pmd_t pmd_mkcont(pmd_t pmd)
> {
> return __pmd(pmd_val(pmd) | PMD_SECT_CONT);
> }
>
>>> if (pagesize == CONT_PTE_SIZE) {
>>> entry = pte_mkcont(entry);
>>> } else if (pagesize == CONT_PMD_SIZE) {
>>