Re: [PATCH -next] mm/hotplug: skip bad PFNs from pfn_to_online_page()

From: Qian Cai
Date: Thu Jun 13 2019 - 13:20:56 EST


On Wed, 2019-06-12 at 12:38 -0700, Dan Williams wrote:
> On Wed, Jun 12, 2019 at 12:37 PM Dan Williams <dan.j.williams@xxxxxxxxx>
> wrote:
> >
> > On Wed, Jun 12, 2019 at 12:16 PM Qian Cai <cai@xxxxxx> wrote:
> > >
> > > The linux-next commit "mm/sparsemem: Add helpers track active portions
> > > of a section at boot" [1] causes a crash below when the first kmemleak
> > > scan kthread kicks in. This is because kmemleak_scan() calls
> > > pfn_to_online_page(() which calls pfn_valid_within() instead of
> > > pfn_valid() on x86 due to CONFIG_HOLES_IN_ZONE=n.
> > >
> > > The commit [1] did add an additional check of pfn_section_valid() in
> > > pfn_valid(), but forgot to add it in the above code path.
> > >
> > > page:ffffea0002748000 is uninitialized and poisoned
> > > raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> > > raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
> > > page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> > > ------------[ cut here ]------------
> > > kernel BUG at include/linux/mm.h:1084!
> > > invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> > > CPU: 5 PID: 332 Comm: kmemleak Not tainted 5.2.0-rc4-next-20190612+ #6
> > > Hardware name: Lenovo ThinkSystem SR530 -[7X07RCZ000]-/-[7X07RCZ000]-,
> > > BIOS -[TEE113T-1.00]- 07/07/2017
> > > RIP: 0010:kmemleak_scan+0x6df/0xad0
> > > Call Trace:
> > > Âkmemleak_scan_thread+0x9f/0xc7
> > > Âkthread+0x1d2/0x1f0
> > > Âret_from_fork+0x35/0x4
> > >
> > > [1] https://patchwork.kernel.org/patch/10977957/
> > >
> > > Signed-off-by: Qian Cai <cai@xxxxxx>
> > > ---
> > > Âinclude/linux/memory_hotplug.h | 1 +
> > > Â1 file changed, 1 insertion(+)
> > >
> > > diff --git a/include/linux/memory_hotplug.h
> > > b/include/linux/memory_hotplug.h
> > > index 0b8a5e5ef2da..f02be86077e3 100644
> > > --- a/include/linux/memory_hotplug.h
> > > +++ b/include/linux/memory_hotplug.h
> > > @@ -28,6 +28,7 @@
> > > ÂÂÂÂÂÂÂÂunsigned long ___nr = pfn_to_section_nr(___pfn);ÂÂÂÂÂÂÂÂÂÂÂ\
> > > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ\
> > > ÂÂÂÂÂÂÂÂif (___nr < NR_MEM_SECTIONS && online_section_nr(___nr) && \
> > > +ÂÂÂÂÂÂÂÂÂÂÂpfn_section_valid(__nr_to_section(___nr), pfn) &&ÂÂÂÂÂÂ\
> > > ÂÂÂÂÂÂÂÂÂÂÂÂpfn_valid_within(___pfn))ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ\
> > > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ___page = pfn_to_page(___pfn);ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ\
> > > ÂÂÂÂÂÂÂÂ___page;ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ\
> >
> > Looks ok to me:
> >
> > Acked-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> >
> > ...but why is pfn_to_online_page() a multi-line macro instead of a
> > static inline like all the helper routines it invokes?
>
> I do need to send out a refreshed version of the sub-section patchset,
> so I'll fold this in and give you a Reported-by credit.

BTW, not sure if your new version will fix those two problem below due to the
same commit.

https://patchwork.kernel.org/patch/10977957/

1) offline is busted [1]. It looks like test_pages_in_a_zone() missed the same
pfn_section_valid() check.

2) powerpc booting is generating endless warnings [2]. In vmemmap_populated() at
arch/powerpc/mm/init_64.c, I tried to change PAGES_PER_SECTION to
PAGES_PER_SUBSECTION, but it alone seems not enough.

[1]
[ÂÂ415.158451][ T1946] page:ffffea00016a0000 is uninitialized and poisoned
[ÂÂ415.158459][ T1946] raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff
ffffffffffffffff
[ÂÂ415.226266][ T1946] raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff
ffffffffffffffff
[ÂÂ415.264284][ T1946] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
[ÂÂ415.294332][ T1946] page_owner info is not active (free page?)
[ÂÂ415.320902][ T1946] ------------[ cut here ]------------
[ÂÂ415.345340][ T1946] kernel BUG at include/linux/mm.h:1084!
[ÂÂ415.370284][ T1946] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ÂÂ415.402589][ T1946] CPU: 12 PID: 1946 Comm: test.sh Not tainted 5.2.0-rc4-
next-20190612+ #6
[ÂÂ415.444923][ T1946] Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420
Gen9, BIOS U19 12/27/2015
[ÂÂ415.485079][ T1946] RIP: 0010:test_pages_in_a_zone+0x285/0x310
[ÂÂ415.511320][ T1946] Code: c6 c0 96 4c a2 48 89 df e8 18 23 f6 ff 0f 0b 48 c7
c7 80 c7 ad a2 e8 ae c2 1f 00 48 c7 c6 c0 96 4c a2 48 89 cf e8 fb 22 f6 ff <0f>
0b 48 c7 c7 00 c8 ad a2 e8 91 c2 1f 00 48 85 db 0f 84 3c ff ff
[ÂÂ415.598840][ T1946] RSP: 0018:ffff88832ba37930 EFLAGS: 00010292
[ÂÂ415.625597][ T1946] RAX: 0000000000000000 RBX: ffff88847fff36c0 RCX:
ffffffffa1b40b78
[ÂÂ415.660713][ T1946] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
ffff88884743d380
[ÂÂ415.695778][ T1946] RBP: ffff88832ba37988 R08: ffffed1108e87a71 R09:
ffffed1108e87a70
[ÂÂ415.730831][ T1946] R10: ffffed1108e87a70 R11: ffff88884743d387 R12:
0000000000060000
[ÂÂ415.766058][ T1946] R13: 0000000000060000 R14: 0000000000060000 R15:
000000000005a800
[ÂÂ415.800727][ T1946] FS:ÂÂ00007fca293e7740(0000) GS:ffff888847400000(0000)
knlGS:0000000000000000
[ÂÂ415.840114][ T1946] CS:ÂÂ0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ÂÂ415.868966][ T1946] CR2: 0000558da8ffffc0 CR3: 00000002bff10006 CR4:
00000000001606a0
[ÂÂ415.904736][ T1946] Call Trace:
[ÂÂ415.920601][ T1946]ÂÂ__offline_pages+0xdd/0x990
[ÂÂ415.942887][ T1946]ÂÂ? online_pages+0x4f0/0x4f0
[ÂÂ415.963195][ T1946]ÂÂ? kasan_check_write+0x14/0x20
[ÂÂ415.984710][ T1946]ÂÂ? __mutex_lock+0x2ac/0xb70
[ÂÂ416.004986][ T1946]ÂÂ? device_offline+0x70/0x110
[ÂÂ416.025654][ T1946]ÂÂ? klist_next+0x43/0x1c0
[ÂÂ416.044819][ T1946]ÂÂ? __mutex_add_waiter+0xc0/0xc0
[ÂÂ416.066741][ T1946]ÂÂ? do_raw_spin_unlock+0xa8/0x140
[ÂÂ416.089036][ T1946]ÂÂ? klist_next+0xf2/0x1c0
[ÂÂ416.108178][ T1946]ÂÂoffline_pages+0x11/0x20
[ÂÂ416.127490][ T1946]ÂÂmemory_block_action+0x12e/0x210
[ÂÂ416.149808][ T1946]ÂÂ? device_remove_class_symlinks+0xc0/0xc0
[ÂÂ416.175650][ T1946]ÂÂmemory_subsys_offline+0x7d/0xb0
[ÂÂ416.197897][ T1946]ÂÂdevice_offline+0xd5/0x110
[ÂÂ416.217800][ T1946]ÂÂ? memory_block_action+0x210/0x210
[ÂÂ416.240809][ T1946]ÂÂstate_store+0xc6/0xe0
[ÂÂ416.259508][ T1946]ÂÂdev_attr_store+0x3f/0x60
[ÂÂ416.279018][ T1946]ÂÂ? device_create_release+0x60/0x60
[ÂÂ416.302081][ T1946]ÂÂsysfs_kf_write+0x89/0xb0
[ÂÂ416.321625][ T1946]ÂÂ? sysfs_file_ops+0xa0/0xa0
[ÂÂ416.341906][ T1946]ÂÂkernfs_fop_write+0x188/0x240
[ÂÂ416.363700][ T1946]ÂÂ__vfs_write+0x50/0xa0
[ÂÂ416.382789][ T1946]ÂÂvfs_write+0x105/0x290
[ÂÂ416.401087][ T1946]ÂÂksys_write+0xc6/0x160
[ÂÂ416.421144][ T1946]ÂÂ? __x64_sys_read+0x50/0x50
[ÂÂ416.444824][ T1946]ÂÂ? fput+0x13/0x20
[ÂÂ416.462255][ T1946]ÂÂ? filp_close+0x8e/0xa0
[ÂÂ416.480951][ T1946]ÂÂ? __close_fd+0xe0/0x110
[ÂÂ416.500343][ T1946]ÂÂ__x64_sys_write+0x43/0x50
[ÂÂ416.520327][ T1946]ÂÂdo_syscall_64+0xc8/0x63b
[ÂÂ416.540048][ T1946]ÂÂ? syscall_return_slowpath+0x120/0x120
[ÂÂ416.564728][ T1946]ÂÂ? __do_page_fault+0x44d/0x5b0
[ÂÂ416.586119][ T1946]ÂÂentry_SYSCALL_64_after_hwframe+0x44/0xa9
[ÂÂ416.611778][ T1946] RIP: 0033:0x7fca28ac63b8
[ÂÂ416.630947][ T1946] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00
00 f3 0f 1e fa 48 8d 05 65 63 2d 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48>
3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[ÂÂ416.717953][ T1946] RSP: 002b:00007ffc33f8eb98 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ÂÂ416.755847][ T1946] RAX: ffffffffffffffda RBX: 0000000000000008 RCX:
00007fca28ac63b8
[ÂÂ416.790908][ T1946] RDX: 0000000000000008 RSI: 0000558daa079880 RDI:
0000000000000001
[ÂÂ416.826002][ T1946] RBP: 0000558daa079880 R08: 000000000000000a R09:
00007ffc33f8e720
[ÂÂ416.861054][ T1946] R10: 000000000000000a R11: 0000000000000246 R12:
00007fca28d98780
[ÂÂ416.896253][ T1946] R13: 0000000000000008 R14: 00007fca28d93740 R15:
0000000000000008
[ÂÂ416.932117][ T1946] Modules linked in: kvm_intel kvm irqbypass dax_pmem
dax_pmem_core ip_tables x_tables xfs sd_mod igb i2c_algo_bit hpsa i2c_core
scsi_transport_sas dm_mirror dm_region_hash dm_log dm_mod
[ÂÂ417.019852][ T1946] ---[ end trace 5a30e75692517f36 ]---
[ÂÂ417.044089][ T1946] RIP: 0010:test_pages_in_a_zone+0x285/0x310
[ÂÂ417.070435][ T1946] Code: c6 c0 96 4c a2 48 89 df e8 18 23 f6 ff 0f 0b 48 c7
c7 80 c7 ad a2 e8 ae c2 1f 00 48 c7 c6 c0 96 4c a2 48 89 cf e8 fb 22 f6 ff <0f>
0b 48 c7 c7 00 c8 ad a2 e8 91 c2 1f 00 48 85 db 0f 84 3c ff ff
[ÂÂ417.158165][ T1946] RSP: 0018:ffff88832ba37930 EFLAGS: 00010292
[ÂÂ417.184809][ T1946] RAX: 0000000000000000 RBX: ffff88847fff36c0 RCX:
ffffffffa1b40b78
[ÂÂ417.220249][ T1946] RDX: 0000000000000000 RSI: 0000000000000008 RDI:
ffff88884743d380
[ÂÂ417.255589][ T1946] RBP: ffff88832ba37988 R08: ffffed1108e87a71 R09:
ffffed1108e87a70
[ÂÂ417.290652][ T1946] R10: ffffed1108e87a70 R11: ffff88884743d387 R12:
0000000000060000
[ÂÂ417.325808][ T1946] R13: 0000000000060000 R14: 0000000000060000 R15:
000000000005a800
[ÂÂ417.360953][ T1946] FS:ÂÂ00007fca293e7740(0000) GS:ffff888847400000(0000)
knlGS:0000000000000000
[ÂÂ417.401830][ T1946] CS:ÂÂ0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ÂÂ417.430817][ T1946] CR2: 0000558da8ffffc0 CR3: 00000002bff10006 CR4:
00000000001606a0
[ÂÂ417.470406][ T1946] Kernel panic - not syncing: Fatal exception
[ÂÂ417.497018][ T1946] Kernel Offset: 0x20600000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ÂÂ417.548754][ T1946] ---[ end Kernel panic - not syncing: Fatal exception ]---

[2]
[ÂÂÂÂ0.000000][ÂÂÂÂT0] WARNING: CPU: 0 PID: 0 at arch/powerpc/mm/pgtable.c:186
set_pte_at+0x3c/0x190
[ÂÂÂÂ0.000000][ÂÂÂÂT0] Modules linked in:
[ÂÂÂÂ0.000000][ÂÂÂÂT0] CPU: 0 PID: 0 Comm: swapper Tainted:
GÂÂÂÂÂÂÂÂWÂÂÂÂÂÂÂÂÂ5.2.0-rc4+ #7
[ÂÂÂÂ0.000000][ÂÂÂÂT0] NIP:ÂÂc00000000006129c LR: c000000000075724 CTR:
c000000000061270
[ÂÂÂÂ0.000000][ÂÂÂÂT0] REGS: c0000000016d7770 TRAP: 0700ÂÂÂTainted:
GÂÂÂÂÂÂÂÂWÂÂÂÂÂÂÂÂÂÂ(5.2.0-rc4+)
[ÂÂÂÂ0.000000][ÂÂÂÂT0] MSR:ÂÂ9000000000021033 <SF,HV,ME,IR,DR,RI,LE>ÂÂCR:
44002884ÂÂXER: 20040000
[ÂÂÂÂ0.000000][ÂÂÂÂT0] CFAR: c00000000005d514 IRQMASK: 1Â
[ÂÂÂÂ0.000000][ÂÂÂÂT0] GPR00: c000000000075724 c0000000016d7a00 c0000000016d4900
c0000000016a48b0Â
[ÂÂÂÂ0.000000][ÂÂÂÂT0] GPR04: c00c0000003d0000 c000001bff5300e8 8e014b001c000080
ffffffffffffffffÂ
[ÂÂÂÂ0.000000][ÂÂÂÂT0] GPR08: c000001bff530000 06000000000000c0 07000000000000c0
0000000000000001Â
[ÂÂÂÂ0.000000][ÂÂÂÂT0] GPR12: c000000000061270 c000000002b30000 c0000000009e8830
c0000000009e8860Â
[ÂÂÂÂ0.000000][ÂÂÂÂT0] GPR16: 0000000000000009 0000000000000009 c000001ffffca000
0000000000000000Â
[ÂÂÂÂ0.000000][ÂÂÂÂT0] GPR20: 0000000000000015 0000000000000000 0000000000000000
c000001ffffc9000Â
[ÂÂÂÂ0.000000][ÂÂÂÂT0] GPR24: c0000000016a48b0 c0000000018a07c0 0000000000000005
c00c0000003d0000Â
[ÂÂÂÂ0.000000][ÂÂÂÂT0] GPR28: 800000000000018e 8000001c004b018e c000001bff5300e8
0000000000000008Â
[ÂÂÂÂ0.000000][ÂÂÂÂT0] NIP [c00000000006129c] set_pte_at+0x3c/0x190
[ÂÂÂÂ0.000000][ÂÂÂÂT0] LR [c000000000075724] __map_kernel_page+0x7a4/0x890
[ÂÂÂÂ0.000000][ÂÂÂÂT0] Call Trace:
[ÂÂÂÂ0.000000][ÂÂÂÂT0] [c0000000016d7a00] [0000000400000000] 0x400000000
(unreliable)
[ÂÂÂÂ0.000000][ÂÂÂÂT0] [c0000000016d7a40] [0000001c004b0000] 0x1c004b0000
[ÂÂÂÂ0.000000][ÂÂÂÂT0] [c0000000016d7af0] [c0000000008b858c]
radix__vmemmap_create_mapping+0x98/0xbc
[ÂÂÂÂ0.000000][ÂÂÂÂT0] [c0000000016d7b70] [c0000000008b7194]
vmemmap_populate+0x284/0x31c
[ÂÂÂÂ0.000000][ÂÂÂÂT0] [c0000000016d7c30] [c0000000008baeb0]
sparse_mem_map_populate+0x40/0x68
[ÂÂÂÂ0.000000][ÂÂÂÂT0] [c0000000016d7c60] [c000000000af5e10]
sparse_init_nid+0x35c/0x550
[ÂÂÂÂ0.000000][ÂÂÂÂT0] [c0000000016d7d20] [c000000000af63b0]
sparse_init+0x1a8/0x240
[ÂÂÂÂ0.000000][ÂÂÂÂT0] [c0000000016d7d60] [c000000000ac67b0]
initmem_init+0x368/0x40c
[ÂÂÂÂ0.000000][ÂÂÂÂT0] [c0000000016d7e80] [c000000000aba9b8]
setup_arch+0x300/0x380
[ÂÂÂÂ0.000000][ÂÂÂÂT0] [c0000000016d7ef0] [c000000000ab3fd8]
start_kernel+0xb4/0x710
[ÂÂÂÂ0.000000][ÂÂÂÂT0] [c0000000016d7f90] [c00000000000ab74]
start_here_common+0x1c/0x4a8