Re: [PATCH 1/5] arm64/mm: Drop pte_mkhuge()

From: Anshuman Khandual
Date: Tue Oct 15 2024 - 00:23:20 EST




On 10/14/24 16:38, Ryan Roberts wrote:
> On 14/10/2024 09:59, Anshuman Khandual wrote:
>>
>>
>> On 10/9/24 18:50, Ryan Roberts wrote:
>>> On 05/10/2024 13:38, Anshuman Khandual wrote:
>>>> Core HugeTLB defines arch_make_huge_pte() fallback definition, which calls
>>>> platform provided pte_mkhuge(). But if any platform already provides custom
>>>> arch_make_huge_pte(), then it does not need to provide pte_mkhuge(). arm64
>>>> defines arch_make_huge_pte(), but then also calls pte_mkhuge() internally.
>>>> This creates confusion as if both of these callbacks are being used in core
>>>> HugeTLB and required to be defined in the platform.
>>>>
>>>> This changes arch_make_huge_pte() to create block mapping directly and also
>>>> drops off now redundant helper pte_mkhuge(), making things clear. Also this
>>>> changes HugeTLB page creation from just clearing the PTE_TABLE_BIT (bit[1])
>>>> to actually setting bits[1:0] via PTE_TYPE_[MASK|SECT] instead.
>>>>
>>>> Cc: Catalin Marinas <catalin.marinas@xxxxxxx>
>>>> Cc: Will Deacon <will@xxxxxxxxxx>
>>>> Cc: Ard Biesheuvel <ardb@xxxxxxxxxx>
>>>> Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
>>>> Cc: Mark Rutland <mark.rutland@xxxxxxx>
>>>> Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
>>>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>>>> Signed-off-by: Anshuman Khandual <anshuman.khandual@xxxxxxx>
>>>> ---
>>>> arch/arm64/include/asm/pgtable-hwdef.h | 1 +
>>>> arch/arm64/include/asm/pgtable.h | 5 -----
>>>> arch/arm64/mm/hugetlbpage.c | 2 +-
>>>> 3 files changed, 2 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
>>>> index fd330c1db289..956a702cb532 100644
>>>> --- a/arch/arm64/include/asm/pgtable-hwdef.h
>>>> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
>>>> @@ -158,6 +158,7 @@
>>>> #define PTE_VALID (_AT(pteval_t, 1) << 0)
>>>> #define PTE_TYPE_MASK (_AT(pteval_t, 3) << 0)
>>>> #define PTE_TYPE_PAGE (_AT(pteval_t, 3) << 0)
>>>> +#define PTE_TYPE_SECT (_AT(pteval_t, 1) << 0)
>>>> #define PTE_TABLE_BIT (_AT(pteval_t, 1) << 1)
>>>> #define PTE_USER (_AT(pteval_t, 1) << 6) /* AP[1] */
>>>> #define PTE_RDONLY (_AT(pteval_t, 1) << 7) /* AP[2] */
>>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>>> index c329ea061dc9..fa4c32a9f572 100644
>>>> --- a/arch/arm64/include/asm/pgtable.h
>>>> +++ b/arch/arm64/include/asm/pgtable.h
>>>> @@ -438,11 +438,6 @@ static inline void __set_ptes(struct mm_struct *mm,
>>>> }
>>>> }
>>>>
>>>> -/*
>>>> - * Huge pte definitions.
>>>> - */
>>>> -#define pte_mkhuge(pte) (__pte(pte_val(pte) & ~PTE_TABLE_BIT))
>>>> -
>>>> /*
>>>> * Hugetlb definitions.
>>>> */
>>>> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
>>>> index 5f1e2103888b..5922c95630ad 100644
>>>> --- a/arch/arm64/mm/hugetlbpage.c
>>>> +++ b/arch/arm64/mm/hugetlbpage.c
>>>> @@ -361,7 +361,7 @@ pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags)
>>>> {
>>>> size_t pagesize = 1UL << shift;
>>>>
>>>> - entry = pte_mkhuge(entry);
>>>> + entry = __pte((pte_val(entry) & ~PTE_TYPE_MASK) | PTE_TYPE_SECT);
>>>
>>> I think there may be an existing bug here; if pagesize == CONT_PTE_SIZE, then
>>> entry will be placed in the level 3 table. In this case, shouldn't bit 1 remain
>>> set, because at level 3, a page mapping is denoted by bits[1:0] = 3 ? Currently
>>> its being unconditionally cleared.
>>
>> That's not a problem, pte_mkcont() brings back both the bits
>> via PTE_TYPE_PAGE along with PTE_CONT.
>>
>> if (pagesize == CONT_PTE_SIZE) {
>> entry = pte_mkcont(entry);
>> } else if (pagesize == CONT_PMD_SIZE) {
>> entry = pmd_pte(pmd_mkcont(pte_pmd(entry)));
>> } else if (pagesize != PUD_SIZE && pagesize != PMD_SIZE) {
>> pr_warn("%s: unrecognized huge page size 0x%lx\n",
>> __func__, pagesize);
>> }
>>
>> static inline pte_t pte_mkcont(pte_t pte)
>> {
>> pte = set_pte_bit(pte, __pgprot(PTE_CONT));
>> return set_pte_bit(pte, __pgprot(PTE_TYPE_PAGE));
>
> Oh wow, that's pretty hacky. Good job we never call pte_mkcont() on a
> PTE_PRESENT_INVALID PTE. This would turn it valid again.

Ideally each individual HW page table helper should not do more than one thing
at a time. Here pte_mkcont() should just take a pte which is already a valid
one with PTE_TYPE_PAGE and just set PTE_CONT.

>
>> }
>>
>> Although the same is not required for CONT_PMD_SIZE size huge
>> pages where only PTE_CONT is enough.
>>
>> static inline pmd_t pmd_mkcont(pmd_t pmd)
>> {
>> return __pmd(pmd_val(pmd) | PMD_SECT_CONT);
>> }
>>
>>>> if (pagesize == CONT_PTE_SIZE) {
>>>> entry = pte_mkcont(entry);
>>>> } else if (pagesize == CONT_PMD_SIZE) {
>>>
>