Re: [PATCH] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions

From: David Hildenbrand
Date: Wed Jan 06 2021 - 06:24:33 EST


On 06.01.21 11:42, Michal Hocko wrote:
> On Wed 06-01-21 10:56:19, David Hildenbrand wrote:
> [...]
>> Note that this is not sufficient in the general case. I already
>> mentioned that we effectively override an already initialized memmap.
>>
>> ---
>>
>> [ SECTION ]
>> Before:
>> [ ZONE_NORMAL ][ Hole ]
>>
>> The hole has some node/zone (currently 0/0, discussions ongoing on how
>> to optimize that to e.g., ZONE_NORMAL in this example) and is
>> PG_reserved - looks like an ordinary memory hole.
>>
>> After memremap:
>> [ ZONE_NORMAL ][ ZONE_DEVICE ]
>>
>> The already initialized memmap was converted to ZONE_DEVICE. Your
>> slowpath will work.
>>
>> After memunmap (no poisioning):
>> [ ZONE_NORMAL ][ ZONE_DEVICE ]
>>
>> The slow path is no longer working. pfn_to_online_page() might return
>> something that is ZONE_DEVICE.
>>
>> After memunmap (poisioning):
>> [ ZONE_NORMAL ][ POISONED ]
>>
>> The slow path is no longer working. pfn_to_online_page() might return
>> something that will BUG_ON via page_to_nid() etc.
>>
>> ---
>>
>> Reason is that pfn_to_online_page() does no care about sub-sections. And
>> for now, it didn't had to. If there was an online section, it either was
>>
>> a) Completely present. The whole memmap is initialized to sane values.
>> b) Partially present. The whole memmap is initialized to sane values.
>>
>> memremap/memunmap messes with case b)
>
> I do not see we ever clear the newly added flag and my understanding is
> that the subsection removed would lead to get_dev_pagemap returning a
> NULL. Which would obviously need to be checked for pfn_to_online_page.
> Or do I miss anything and the above is not the case and we could still
> get false positives?

See my example above ("After memunmap").

We're still in the slow pathg. pfn_to_online_page() will return a struct
page as get_dev_pagemap() is now NULL.

Yet page_zone(page) will either
- BUG_ON (memmap was poisoned)
- return ZONE_DEVICE zone (memmap not poisoned when memunmapping)

As I said, can be tackled by checking for pfn_section_valid() at least
on the slow path. Ideally also on the fast path.

>
>> Well have to further tweak pfn_to_online_page(). You'll have to also
>> check pfn_section_valid() *at least* on the slow path. Less-hacky would
>> be checking it also in the "somehwat-faster" path - that would cover
>> silently overriding a memmap that's visible via pfn_to_online_page().
>> Might slow down things a bit.
>>
>>
>> Not completely opposed to this, but I would certainly still prefer just
>> avoiding this corner case completely instead of patching around it. Thanks!
>
> Well, I would love to have no surprises either. So far there was not
> actual argument why the pmem reserved space cannot be fully initialized.

Yes, I'm still hoping Dan can clarify that.

> On the other hand making sure that pfn_to_online_page sounds like the
> right thing to do. And having an explicit check for zone device there in
> a slow path makes sense to me.

As I said, I'd favor to simplify and just get rid of the special case,
instead of coming up with increasingly complex ways to deal with it.
pfn_to_online_page() used to be simple, essentially checking a single
flag was sufficient in most setups.

--
Thanks,

David / dhildenb