Re: [PATCH v1] mm: optimize memory hotplug

From: Pavel Tatashin
Date: Wed Jan 31 2018 - 13:38:31 EST


Hi Michal,

> So how do we check that there is no page_to_nid() user before we online
> the page?

The poisoning helps to catch these now, and will in the future.
Because we are setting "struct page" to all 1s, we get nid that is
bigger than supported, and thus panic due to NULL pointer dereference,
or some other reason.

For example, if in online_pages() I replace get_section_nid() back to
pfn_to_nid(), I am getting panic like this:

[ 45.473228] BUG: KASAN: null-ptr-deref in zone_for_pfn_range+0xce/0x240
[ 45.475273] Read of size 8 at addr 0000000000000068 by task bash/144
[ 45.477240]
[ 45.477744] CPU: 0 PID: 144 Comm: bash Not tainted
4.15.0-next-20180130_pt_memset #11
[ 45.479947] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.11.0-20171110_100015-anatol 04/01/2014
[ 45.482053] Call Trace:
[ 45.482589] dump_stack+0xa6/0x109
[ 45.483304] ? _atomic_dec_and_lock+0x137/0x137
[ 45.484248] ? zone_for_pfn_range+0xce/0x240
[ 45.485140] kasan_report+0x208/0x350
[ 45.485916] zone_for_pfn_range+0xce/0x240
[ 45.486787] online_pages+0xf0/0x4a0

I remember I was fighting strange bugs when reworking this
> code. I have forgot all the details of course, I just remember some
> nasty and subtle code paths. Maybe we have got rid of those in the past
> year but this should be done really carefully. We might have similar
> dependences on PageReserved.

I am adding a new PG_POISON_CHECK() to help with both Page* macros,
and page_to_nid(). A new patch is coming.

Thank you,
Pavel