Re: [RFC PATCH 0/8] Reimplement huge pages without hugepd on powerpc 8xx
From: Christophe Leroy
Date: Tue Apr 16 2024 - 06:58:47 EST
Le 15/04/2024 à 21:12, Christophe Leroy a écrit :
>
>
> Le 12/04/2024 à 16:30, Peter Xu a écrit :
>> On Fri, Apr 12, 2024 at 02:08:03PM +0000, Christophe Leroy wrote:
>>>
>>>
>>> Le 11/04/2024 à 18:15, Peter Xu a écrit :
>>>> On Mon, Mar 25, 2024 at 01:38:40PM -0300, Jason Gunthorpe wrote:
>>>>> On Mon, Mar 25, 2024 at 03:55:53PM +0100, Christophe Leroy wrote:
>>>>>> This series reimplements hugepages with hugepd on powerpc 8xx.
>>>>>>
>>>>>> Unlike most architectures, powerpc 8xx HW requires a two-level
>>>>>> pagetable topology for all page sizes. So a leaf PMD-contig approach
>>>>>> is not feasible as such.
>>>>>>
>>>>>> Possible sizes are 4k, 16k, 512k and 8M.
>>>>>>
>>>>>> First level (PGD/PMD) covers 4M per entry. For 8M pages, two PMD
>>>>>> entries
>>>>>> must point to a single entry level-2 page table. Until now that was
>>>>>> done using hugepd. This series changes it to use standard page tables
>>>>>> where the entry is replicated 1024 times on each of the two
>>>>>> pagetables
>>>>>> refered by the two associated PMD entries for that 8M page.
>>>>>>
>>>>>> At the moment it has to look into each helper to know if the
>>>>>> hugepage ptep is a PTE or a PMD in order to know it is a 8M page or
>>>>>> a lower size. I hope this can me handled by core-mm in the future.
>>>>>>
>>>>>> There are probably several ways to implement stuff, so feedback is
>>>>>> very welcome.
>>>>>
>>>>> I thought it looks pretty good!
>>>>
>>>> I second it.
>>>>
>>>> I saw the discussions in patch 1. Christophe, I suppose you're
>>>> exploring
>>>> the big hammer over hugepd, and perhaps went already with the 32bit pmd
>>>> solution for nohash/32bit challenge you mentioned?
>>>>
>>>> I'm trying to position my next step; it seems like at least I should
>>>> not
>>>> adding any more hugepd code, then should I go with ARCH_HAS_HUGEPD
>>>> checks,
>>>> or you're going to have an RFC soon then I can base on top?
>>>
>>> Depends on what you expect by "soon".
>>>
>>> I sure won't be able to send any RFC before end of April.
>>>
>>> Should be possible to have something during May.
>>
>> That's good enough, thanks. I'll see what is the best I can do.
>>
>> Then do you think I can leave p4d/pgd leaves alone? Please check the
>> other
>> email where I'm not sure whether pgd leaves ever existed for any of
>> PowerPC. That's so far what I plan to do, on teaching pgtable walkers
>> recognize pud and lower for all leaves. Then if Power can switch from
>> hugepd to this it should just work.
>
> Well, if I understand correctly, something with no PMD will include
> <asm-generic/pgtable-nopmd.h> and will therefore only have .... pmd
> entries (hence no pgd/p4d/pud entries). Looks odd but that's what it is.
> pgd_populate(), p4d_populate(), pud_populate() are all "do { } while
> (0)" and only pmd_populate exists. So only pmd_leaf() will exist in that
> case.
>
> And therefore including <asm-generic/pgtable-nop4d.h> means .... you
> have p4d entries. Doesn't mean you have p4d_leaf() but that needs to be
> checked.
>
>
>>
>> Even if pgd exists (then something I overlooked..), I'm wondering whether
>> we can push that downwards to be either pud/pmd (and looks like we all
>> agree p4d is never used on Power). That may involve some pgtable
>> operations moving from pgd level to lower, e.g. my pure imagination would
>> look like starting with:
>
> Yes I think there is no doubt that p4d is never used:
>
> arch/powerpc/include/asm/book3s/32/pgtable.h:#include
> <asm-generic/pgtable-nopmd.h>
> arch/powerpc/include/asm/book3s/64/pgtable.h:#include
> <asm-generic/pgtable-nop4d.h>
> arch/powerpc/include/asm/nohash/32/pgtable.h:#include
> <asm-generic/pgtable-nopmd.h>
> arch/powerpc/include/asm/nohash/64/pgtable-4k.h:#include
> <asm-generic/pgtable-nop4d.h>
>
> But that means that PPC32 have pmd entries and PPC64 have p4d entries ...
>
>>
>> #define PTE_INDEX_SIZE PTE_SHIFT
>> #define PMD_INDEX_SIZE 0
>> #define PUD_INDEX_SIZE 0
>> #define PGD_INDEX_SIZE (32 - PGDIR_SHIFT)
>>
>> To:
>>
>> #define PTE_INDEX_SIZE PTE_SHIFT
>> #define PMD_INDEX_SIZE (32 - PMD_SHIFT)
>> #define PUD_INDEX_SIZE 0
>> #define PGD_INDEX_SIZE 0
>
> But then you can't anymore have #define PTRS_PER_PMD 1 from
> <asm-generic/pgtable-nop4d.h>
>
>>
>> And the rest will need care too. I hope moving downward is easier
>> (e.g. the walker should always exist for lower levels but not always for
>> higher levels), but I actually have little idea on whether there's any
>> other implications, so please bare with me on stupid mistakes.
>>
>> I just hope pgd leaves don't exist already, then I think it'll be
>> simpler.
>>
>> Thanks,
>>
Digging into asm-generic/pgtable-nopmd.h, I see a definition of
pud_leaf() always returning 0, introduced by commit 2c8a81dc0cc5
("riscv/mm: fix two page table check related issues")
So should asm-generic/pgtable-nopud.h contain the same for p4d_leaf()
and asm-generic/pgtable-nop4d.h contain the same for pgd_leaf() ?
Christophe