Re: [PATCH] hugetlb: simplify hugetlb handling in follow_page_mask

From: David Hildenbrand
Date: Mon Sep 05 2022 - 05:46:33 EST


On 05.09.22 11:33, Christophe Leroy wrote:


Le 05/09/2022 à 10:37, David Hildenbrand a écrit :
On 03.09.22 09:07, Christophe Leroy wrote:
+Resending with valid powerpc list address

Le 02/09/2022 à 20:52, David Hildenbrand a écrit :
Adding Christophe on Cc:

Christophe do you know if is_hugepd is true for all hugetlb
entries, not
just hugepd?

is_hugepd() is true if and only if the directory entry points to a huge
page directory and not to the normal lower level directory.

As far as I understand if the directory entry is not pointing to any
lower directory but is a huge page entry, pXd_leaf() is true.



On systems without hugepd entries, I guess ptdump skips all
hugetlb entries.
Sigh!

As far as I can see, ptdump_pXd_entry() handles the pXd_leaf() case.


IIUC, the idea of ptdump_walk_pgd() is to dump page tables even
outside
VMAs (for debugging purposes?).

I cannot convince myself that that's a good idea when only holding the
mmap lock in read mode, because we can just see page tables getting
freed concurrently e.g., during concurrent munmap() ... while holding
the mmap lock in read we may only walk inside VMA boundaries.

That then raises the questions if we're only calling this on
special MMs
(e.g., init_mm) whereby we cannot really see concurrent munmap() and
where we shouldn't have hugetlb mappings or hugepd entries.

At least on powerpc, PTDUMP handles only init_mm.

Hugepage are used at least on powerpc 8xx for linear memory mapping, see

commit 34536d780683 ("powerpc/8xx: Add a function to early map kernel
via huge pages")
commit cf209951fa7f ("powerpc/8xx: Map linear memory with huge pages")

hugepds may also be used in the future to use huge pages for vmap and
vmalloc, see commit a6a8f7c4aa7e ("powerpc/8xx: add support for huge
pages on VMAP and VMALLOC")

As far as I know, ppc64 also use huge pages for VMAP and VMALLOC, see

commit d909f9109c30 ("powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP")
commit 8abddd968a30 ("powerpc/64s/radix: Enable huge vmalloc mappings")

There is a difference between an ordinary huge mapping (e.g., as used
for THP) and a a hugetlb mapping.

Our current understanding is that hugepd only applies to hugetlb.
Wouldn't vmap/vmalloc user ordinary huge pmd entries instead of hugepd?


'hugepd' stands for huge page directory. It is independant of whether a
huge page is used for hugetlb or for anything else, it represents the
way pages are described in the page tables.

This patch here makes the assumption that hugepd only applies to hugetlb, because it removes any such handling from the !hugetlb path in GUP. Is that incorrect or are there valid cases where that could happen? (init_mm is special in that regard, i don't think it interacts with GUP at all).


I don't know what you mean by _ordinary_ huge pmd entry.


Essentially, what we use for THP. Let me try to understand how hugepd interact with the rest of the system.

Do systems that support hugepd currently implement THP? Reading above 32bit systems below, I assume not?

Let's take the exemple of powerpc 8xx which is the one I know best. This
is a powerpc32, so it has two levels : PGD and PTE. PGD has 1024 entries
and each entry covers a 4Mbytes area. Normal PTE has 1024 entries and
each entry is a 4k page. When you use 8Mbytes pages, you don't use PTEs
as it would be a waste of memory. You use a huge page directory that has
a single entry, and you have two PGD entries pointing to the huge page
directory.

Thanks, I assume there are no 8MB THP, correct?

The 8MB example with 4MB PGD entries makes it sound a bit like the cont-PTE/cont-PMD handling on aarch64: they don't use a hugepd but would simply let two consecutive PGD entries point at the the relevant (sub) parts of the hugetlb page. No hugepd involved.


Some time ago, hupgepd was also used for 512kbytes pages and 16kbytes
pages:
- there was huge page directories with 8x 512kbytes pages,
- there was huge page directories with 256x 16kbytes pages,

And the PGD/PMD entry points to a huge page directory (HUGEPD) instead
of pointing to a page table directory (PTE).

Thanks for the example.


Since commit b250c8c08c79 ("powerpc/8xx: Manage 512k huge pages as
standard pages."), the 8xx doesn't use anymore hugepd for 512k huge
page, but other platforms like powerpc book3e extensively use huge page
directories.

I hope this clarifies the subject, otherwise I'm happy to provide
further details.

Thanks, it would be valuable to know if the assumption in this patch is correct: hugepd will only be found in hugetlb areas in ordinary MMs (not init_mm).

--
Thanks,

David / dhildenb