Re: [RFC 1/1] x86/vmemmap: Add missing update of PML4 table / PML5 table entry
From: Dave Hansen
Date: Fri Feb 14 2025 - 19:29:15 EST
On 2/14/25 16:20, Harry (Hyeonggon) Yoo wrote:
> On Fri, Feb 14, 2025 at 11:57:50AM -0800, Dave Hansen wrote:
>> On 2/14/25 11:51, Gwan-gyeong Mun wrote:
>>> when performing vmemmap populate, if the entry of the PML4 table/PML5 table
>>> pointing to the target virtual address has never been updated, a page fault
>>> occurs when the memset(start) called from the vmemmap_use_new_sub_pmd()
>>> execution flow.
>>
>> "Page fault" meaning oops? Or something that we manage to handle and
>> return from without oopsing?
>
> It means oops, because the kernel accesses part of vmemmap that's not
> populated (yet) in current process's page table.
Your 0/1 cover letter got to me after this mail did. I see the oops
there clear as day now.
> This oops was observed after increasing the size of struct page (as a part of
> developing a debug feature), but the real cause is that page table entries are
> only installed in init_mm's page table and then sync'd later, but in the mean
> time the process that triggered hot-plug accesses new portion of vmemmap.
>
> If the process does not directly use the page table of init_mm (like swapper)
> this oops can occur (e.g., I was able to trigger with `sudo modprobe hmm_test`
> after increasing the size of struct page).
Makes sense. Thanks for the explanation.
>>> This fixes the problem of using the virtual address without updating the
>>> entry in the PML4 table or PML5 table. But this is a temporary solution to
>>> prevent page fault problems, and it requires improvement of the routine
>>> that updates the missing entry in the PML4 table or PML5 table.
>>
>> Can we please skip past the band-aid and go to the real fix?
>
> Yes, of course it'd best to skip a temporary fix.
> The intention is to report/discuss the problem and a fix as a starting point.
Do you have a better fix in mind?