Re: [PATCH v2 2/2] mm/memory-failure.c: Don't access uninitialized memmaps in memory_failure()

From: David Hildenbrand
Date: Fri Oct 11 2019 - 06:13:22 EST


On 11.10.19 08:02, Naoya Horiguchi wrote:
> On Thu, Oct 10, 2019 at 09:58:40AM +0200, David Hildenbrand wrote:
>> On 10.10.19 09:52, David Hildenbrand wrote:
>>> On 10.10.19 09:35, Michal Hocko wrote:
>>>> On Thu 10-10-19 09:27:32, David Hildenbrand wrote:
>>>>> On 09.10.19 16:43, Michal Hocko wrote:
>>>>>> On Wed 09-10-19 16:24:35, David Hildenbrand wrote:
>>>>>>> We should check for pfn_to_online_page() to not access uninitialized
>>>>>>> memmaps. Reshuffle the code so we don't have to duplicate the error
>>>>>>> message.
>>>>>>>
>>>>>>> Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
>>>>>>> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>>>>>>> Cc: Michal Hocko <mhocko@xxxxxxxxxx>
>>>>>>> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
>>>>>>> ---
>>>>>>> mm/memory-failure.c | 14 ++++++++------
>>>>>>> 1 file changed, 8 insertions(+), 6 deletions(-)
>>>>>>>
>>>>>>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>>>>>>> index 7ef849da8278..e866e6e5660b 100644
>>>>>>> --- a/mm/memory-failure.c
>>>>>>> +++ b/mm/memory-failure.c
>>>>>>> @@ -1253,17 +1253,19 @@ int memory_failure(unsigned long pfn, int flags)
>>>>>>> if (!sysctl_memory_failure_recovery)
>>>>>>> panic("Memory failure on page %lx", pfn);
>>>>>>>
>>>>>>> - if (!pfn_valid(pfn)) {
>>>>>>> + p = pfn_to_online_page(pfn);
>>>>>>> + if (!p) {
>>>>>>> + if (pfn_valid(pfn)) {
>>>>>>> + pgmap = get_dev_pagemap(pfn, NULL);
>>>>>>> + if (pgmap)
>>>>>>> + return memory_failure_dev_pagemap(pfn, flags,
>>>>>>> + pgmap);
>>>>>>> + }
>>>>>>> pr_err("Memory failure: %#lx: memory outside kernel control\n",
>>>>>>> pfn);
>>>>>>> return -ENXIO;
>>>>>>
>>>>>> Don't we need that earlier at hwpoison_inject level?
>>>>>>
>>>>>
>>>>> Theoretically yes, this is another instance. But pfn_to_online_page(pfn)
>>>>> alone would not be sufficient as discussed. We would, again, have to
>>>>> special-case ZONE_DEVICE via things like get_dev_pagemap() ...
>>>>>
>>>>> But mm/hwpoison-inject.c:hwpoison_inject() is a pure debug feature either way:
>>>>>
>>>>> /*
>>>>> * Note that the below poison/unpoison interfaces do not involve
>>>>> * hardware status change, hence do not require hardware support.
>>>>> * They are mainly for testing hwpoison in software level.
>>>>> */
>>>>>
>>>>> So it's not that bad compared to memory_failure() called from real HW or
>>>>> drivers/base/memory.c:soft_offline_page_store()/hard_offline_page_store()
>>>>
>>>> Yes, this is just a toy. And yes we need to handle zone device pages
>>>> here because a) people likely want to test MCE behavior even on these
>>>> pages and b) HW can really trigger MCEs there as well. I was just
>>>> pointing that the patch is likely incomplete.
>>>>
>>>
>>> I rather think this deserves a separate patch as it is a separate
>>> interface :)
>>>
>>> I do wonder why hwpoison_inject() has to perform so much extra work
>>> compared to other memory_failure() users. This smells like legacy
>>> leftovers to me, but I might be wrong. The interface is fairly old,
>>> though. Does anybody know why we need this magic? I can spot quite some
>>> duplicate checks/things getting performed.
>
> It concerns me too, this *is* an old legacy code. I guess it was left as-is
> because no one complained about it. That's not good, so I'll do some cleanup.

Most of that stuff was introduced in

commit 31d3d3484f9bd263925ecaa341500ac2df3a5d9b
Author: Wu Fengguang <fengguang.wu@xxxxxxxxx>
Date: Wed Dec 16 12:19:59 2009 +0100

HWPOISON: limit hwpoison injector to known page types

__memory_failure()'s workflow is

set PG_hwpoison
//...
unset PG_hwpoison if didn't pass hwpoison filter

That could kill unrelated process if it happens to page fault on the
page with the (temporary) PG_hwpoison. The race should be big enough to
appear in stress tests.

Fix it by grabbing the page and checking filter at inject time. This
also avoids the very noisy "Injecting memory failure..." messages.


Now, we still have the same "issue" in memory_failure() today:


if (TestSetPageHWPoison(p)) {
pr_err("Memory failure: %#lx: already hardware poisoned\n",
pfn);
return 0;
}
[...]
if (hwpoison_filter(p)) {
if (TestClearPageHWPoison(p))
num_poisoned_pages_dec();
unlock_page(p);
put_hwpoison_page(p);
return 0;
}

However, I don't understand why we need that special handling only for this
debug interface and not the other users.

I'd vote for ripping out that legacy crap (so the interface works correctly
with ZONE_DEVICE) and instead (if really required) rework memory_failure()
to not produce such side effects.

>
>>>
>>> Naiive me would just make the interface perform the same as
>>> hard_offline_page_store(). But most probably I am not getting the real
>>> purpose of both different interfaces.
>
> Maybe for historical reason, we have these slightly different interfaces:
>
> - corrupt-pfn
> - purely for debugging purpose
> - paired with unpoison-pfn
> - used by in-kernel tool tools/vm/page-types.c
> - hard_offline_page
> - paired with soft_offline_page
> - used by other userspace tools like mcelog
>
> But these don't explain why implemented differently, so I think that both
> should be written in the same manner.
>
>>>
>>> HWPOISON_INJECT is only selected for DEBUG_KERNEL, so I would have
>>> guessed that fixing this is not urgent.
>>>
>>> BTW: mm/memory-failure.c:soft_offline_page() also looks wrong and needs
>>> fixing to make sure we access initialized memmaps.
>>>
>>
>> To be more precise, soft_offline_page_store() needs a
>> pfn_to_online_page() check. Will send a patch.
>
> Thanks for finding this.
>
> - Naoya Horiguchi
>


--

Thanks,

David / dhildenb