Re: [PATCH RFC 06/12] mm/gup: Drop folio_fast_pin_allowed() in hugepd processing

From: Christophe Leroy
Date: Fri Nov 24 2023 - 02:03:21 EST




Le 23/11/2023 à 20:37, Peter Xu a écrit :
> On Thu, Nov 23, 2023 at 06:22:33PM +0000, Christophe Leroy wrote:
>>> For fast-gup I think the hugepd code is in use, however for walk_page_*
>>> apis hugepd code shouldn't be reached iiuc as we have the hugetlb specific
>>> handling (walk_hugetlb_range()), so anything within walk_pgd_range() to hit
>>> a hugepd can be dead code to me (but note that this "dead code" is good
>>> stuff to me, if one would like to merge hugetlb instead into generic mm).
>>
>> Not sure what you mean here. What do you mean by "dead code" ?
>> A hugepage directory can be plugged at any page level, from PGD to PMD.
>> So the following bit in walk_pgd_range() is valid and not dead:
>>
>> if (is_hugepd(__hugepd(pgd_val(*pgd))))
>> err = walk_hugepd_range((hugepd_t *)pgd, addr, next, walk, PGDIR_SHIFT);
>
> IMHO it boils down to the question on whether hugepd is only used in
> hugetlbfs. I think I already mentioned that above, but I can be more
> explicit; what I see is that from higher stack in __walk_page_range():
>
> if (is_vm_hugetlb_page(vma)) {
> if (ops->hugetlb_entry)
> err = walk_hugetlb_range(start, end, walk);
> } else
> err = walk_pgd_range(start, end, walk);
>
> It means to me as long as the vma is hugetlb, it'll not trigger any code in
> walk_pgd_range(), but only walk_hugetlb_range(). Do you perhaps mean
> hugepd is used outside hugetlbfs?

I added that code with commit e17eae2b8399 ("mm: pagewalk: fix walk for
hugepage tables") because I was getting crazy displays when dumping
/sys/kernel/debug/pagetables

Huge pages can be used for many thing.

On powerpc 8xx, there are 4 possible page size: 4k, 16k, 512k and 8M.
Each PGD entry addresses 4M areas, so hugepd is used for anything using
8M pages. Could have used regular page tables instead, but it is not
worth allocating a 4k table when the HW will only read first entry.

At the time being, linear memory mapping is performed with 8M pages, so
ptdump_walk_pgd() will walk into huge page directories.

Also, huge pages can be used in vmalloc() and in vmap(). At the time
being we support 512k pages there on the 8xx. 8M pages will be supported
once vmalloc() and vmap() support hugepd, as explained in commit
a6a8f7c4aa7e ("powerpc/8xx: add support for huge pages on VMAP and VMALLOC")

So yes as a conclusion hugepd is used outside hugetlbfs, hope it
clarifies things.

Christophe