Re: [RFC 1/1] x86/vmemmap: Add missing update of PML4 table / PML5 table entry

From: Gwan-gyeong Mun
Date: Mon Feb 17 2025 - 06:42:33 EST




On 2/15/25 2:29 AM, Dave Hansen wrote:
On 2/14/25 16:20, Harry (Hyeonggon) Yoo wrote:
On Fri, Feb 14, 2025 at 11:57:50AM -0800, Dave Hansen wrote:
On 2/14/25 11:51, Gwan-gyeong Mun wrote:
when performing vmemmap populate, if the entry of the PML4 table/PML5 table
pointing to the target virtual address has never been updated, a page fault
occurs when the memset(start) called from the vmemmap_use_new_sub_pmd()
execution flow.

"Page fault" meaning oops? Or something that we manage to handle and
return from without oopsing?

It means oops, because the kernel accesses part of vmemmap that's not
populated (yet) in current process's page table.

Your 0/1 cover letter got to me after this mail did. I see the oops
there clear as day now.

This oops was observed after increasing the size of struct page (as a part of
developing a debug feature), but the real cause is that page table entries are
only installed in init_mm's page table and then sync'd later, but in the mean
time the process that triggered hot-plug accesses new portion of vmemmap.

If the process does not directly use the page table of init_mm (like swapper)
this oops can occur (e.g., I was able to trigger with `sudo modprobe hmm_test`
after increasing the size of struct page).

Makes sense. Thanks for the explanation.

This fixes the problem of using the virtual address without updating the
entry in the PML4 table or PML5 table. But this is a temporary solution to
prevent page fault problems, and it requires improvement of the routine
that updates the missing entry in the PML4 table or PML5 table.

Can we please skip past the band-aid and go to the real fix?

Yes, of course it'd best to skip a temporary fix.
The intention is to report/discuss the problem and a fix as a starting point.

Do you have a better fix in mind?

Yes, first what comes to mind right now to safely access the virtual address is; translating vmemmap-based virtual address to direct-mapped virtual address and use it, if the current top-level page table is not init_mm's page table when accessing a vmemmap-based virtual address before page table sync.

I will send a patch first with this idea.
If you have any better ideas, please let me know.

Br,
G.G.