Re: [PATCH V3 2/4] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump

From: Anshuman Khandual
Date: Thu May 16 2019 - 07:08:11 EST


On 05/16/2019 03:53 PM, Mark Rutland wrote:
> Hi Michal,
>
> On Wed, May 15, 2019 at 06:58:47PM +0200, Michal Hocko wrote:
>> On Tue 14-05-19 14:30:05, Anshuman Khandual wrote:
>>> The arm64 pagetable dump code can race with concurrent modification of the
>>> kernel page tables. When a leaf entries are modified concurrently, the dump
>>> code may log stale or inconsistent information for a VA range, but this is
>>> otherwise not harmful.
>>>
>>> When intermediate levels of table are freed, the dump code will continue to
>>> use memory which has been freed and potentially reallocated for another
>>> purpose. In such cases, the dump code may dereference bogus addressses,
>>> leading to a number of potential problems.
>>>
>>> Intermediate levels of table may by freed during memory hot-remove, or when
>>> installing a huge mapping in the vmalloc region. To avoid racing with these
>>> cases, take the memory hotplug lock when walking the kernel page table.
>>
>> Why is this a problem only on arm64
>
> It looks like it's not -- I think we're just the first to realise this.
>
> AFAICT x86's debugfs ptdump has the same issue if run conccurently with
> memory hot remove. If 32-bit arm supported hot-remove, its ptdump code
> would have the same issue.
>
>> and why do we even care for debugfs? Does anybody rely on this thing
>> to be reliable? Do we even need it? Who is using the file?
>
> The debugfs part is used intermittently by a few people working on the
> arm64 kernel page tables. We use that both to sanity-check that kernel
> page tables are created/updated correctly after changes to the arm64 mmu
> code, and also to debug issues if/when we encounter issues that appear
> to be the result of kernel page table corruption.
>
> So while it's rare to need it, it's really useful to have when we do
> need it, and I'd rather not remove it. I'd also rather that it didn't
> have latent issues where we can accidentally crash the kernel when using
> it, which is what this patch is addressing.
>
>> I am asking because I would really love to make mem hotplug locking less
>> scattered outside of the core MM than more. Most users simply shouldn't
>> care. Pfn walkers should rely on pfn_to_online_page.
>
> I'm not sure if that would help us here; IIUC pfn_to_online_page() alone
> doesn't ensure that the page remains online. Is there a way to achieve
> that other than get_online_mems()?

Still wondering how pfn_to_online_page() is applicable here. It validates
a given PFN and whether its online from sparse section mapping perspective
before giving it's struct page. IIUC it is used during a linear scanning
of a physical address range not for a page table walk. So how it can solve
the problem when a struct page which was used as an intermediate level page
table page gets released back to the buddy from another concurrent thread ?