Re: [PATCH] mm/page_owner: fix a crash after memory offline

From: David Hildenbrand
Date: Thu Oct 10 2019 - 14:55:47 EST


On 10.10.19 20:32, Qian Cai wrote:
The linux-next series "mm/memory_hotplug: Shrink zones before removing
memory" [1] seems make a crash easier to reproduce while reading
/proc/pagetypeinfo after offlining a memory section. Fix it by using
pfn_to_online_page() in the PFN walker.

Can you please rephrase the subject+description to describe the actual problem and drop the reference to the series?

E.g., similar to my recent patches:

"mm/page_owner: Don't access uninitialized memmaps when reading /proc/pagetypeinfo

Uninitialized memmaps contain garbage and in the worst case trigger kernel BUGs, especially with CONFIG_PAGE_POISONING. They should not get
touched.

For example, when not onlining a memory block that is spanned by a zone and reading /proc/pagetypeinfo, we can trigger a kernel BUG: ...
"

However, you also have to justify why it is okay to no longer consider ZONE_DEVICE (I think walk_zones_in_node() will skip ZONE_DEVICE due to assert_populated == true and ZONE_DEVICE will never be populated, Therefore, we will never end in this code path with ZONE_DEVICE).



[1] https://lore.kernel.org/linux-mm/20191006085646.5768-1-david@xxxxxxxxxx/

page:ffffea0021200000 is uninitialized and poisoned
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
There is not page extension available.
------------[ cut here ]------------
kernel BUG at include/linux/mm.h:1107!
RIP: 0010:pagetypeinfo_showmixedcount_print+0x4fb/0x680
Call Trace:
walk_zones_in_node+0x3a/0xc0
pagetypeinfo_show+0x260/0x2c0
seq_read+0x27e/0x710
proc_reg_read+0x12e/0x190
__vfs_read+0x50/0xa0
vfs_read+0xcb/0x1e0
ksys_read+0xc6/0x160
__x64_sys_read+0x43/0x50
do_syscall_64+0xcc/0xaec
entry_SYSCALL_64_after_hwframe+0x49/0xbe

Signed-off-by: Qian Cai <cai@xxxxxx>
---
mm/page_owner.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index dee931184788..03a6b19b3cdd 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -296,11 +296,10 @@ void pagetypeinfo_showmixedcount_print(struct seq_file *m,
pageblock_mt = get_pageblock_migratetype(page);

What about the pfn_valid() in the outermost loop? You can skip over the whole pageblock if the first page is not online.

for (; pfn < block_end_pfn; pfn++) {
- if (!pfn_valid_within(pfn))
+ page = pfn_to_online_page(pfn);
+ if (!page)
continue;
- page = pfn_to_page(pfn);
-
if (page_zone(page) != zone)
continue;



--

Thanks,

David / dhildenb