[REGRESSION] kmemleak: commit c566586818 causes failure to boot

From: Theodore Y. Ts'o
Date: Sun Oct 13 2019 - 22:26:44 EST


Commit c566586818 ("mm: kmemleak: use the memory pool for early
allocations") causes my test kernels to fail to boot on using both kvm
and using Google Compute Engine. A git bisect localized it to
c566586818, and I confirmed by test building v5.4-rc3, which failed as
above using KVM. When I reverted c566586818 the kernel booted
successfully.

The symptoms are that the boot hangs after:

[ 2.844808] hctosys: unable to open rtc device (rtc0)

and then about 25 seconds later, we get the following warning:

[ 28.237938] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/0:1:7]
[ 28.239345] irq event stamp: 198897938
[ 28.240017] hardirqs last enabled at (198897937): [<ffffffffa0f0e9c3>] _raw_write_unlock_irqrestore+
0x43/0x47
[ 28.241979] hardirqs last disabled at (198897938): [<ffffffffa040180a>] trace_hardirqs_off_thunk+0x1a
/0x20
[ 28.243930] softirqs last enabled at (198876302): [<ffffffffa120032a>] __do_softirq+0x32a/0x42a
[ 28.247350] softirqs last disabled at (198876295): [<ffffffffa04b84e3>] irq_exit+0xb3/0xc0
[ 28.250080] CPU: 0 PID: 7 Comm: kworker/0:1 Not tainted 5.4.0-rc3-xfstests-00403-g4f5cafb5cb84 #1225
[ 28.253081] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 28.254885] Workqueue: events kmemleak_do_cleanup
[ 28.255570] RIP: 0010:_raw_write_unlock_irqrestore+0x45/0x47
[ 28.256401] Code: e8 b0 4d 60 ff 48 89 ef e8 d8 a6 60 ff f6 c7 02 75 11 53 9d e8 dc b1 68 ff 65 ff 0d
cd 73 10 5f 5b 5d c3 e8 ed b0 68 ff 53 9d <eb> ed 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
90 90
[ 28.260440] RSP: 0000:ffff98984006fdf8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 28.262258] RAX: ffff94d7fd23a1c0 RBX: 0000000000000246 RCX: 0000000000000006
[ 28.264238] RDX: 0000000000000007 RSI: ffff94d7fd23a9c0 RDI: ffff94d7fd23a1c0
[ 28.267333] RBP: ffffffffa1c94bc0 R08: 00000006931c1cf2 R09: 0000000000000000
[ 28.269871] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 28.272175] R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffa1c94aa8
[ 28.274649] FS: 0000000000000000(0000) GS:ffff94d7fd800000(0000) knlGS:0000000000000000
[ 28.277758] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 28.279638] CR2: 0000000000000000 CR3: 000000005fc12001 CR4: 0000000000360ef0
[ 28.282367] Call Trace:
[ 28.283075] find_and_remove_object+0x7f/0x90
[ 28.284335] delete_object_full+0xc/0x20
[ 28.285488] __kmemleak_do_cleanup+0x63/0x100
[ 28.286913] process_one_work+0x246/0x570
[ 28.288801] worker_thread+0x50/0x3b0
[ 28.290406] ? process_one_work+0x570/0x570
[ 28.291497] kthread+0x126/0x140
[ 28.292316] ? kthread_delayed_work_timer_fn+0xa0/0xa0
[ 28.294262] ret_from_fork+0x3a/0x50
[ 28.837921] rcu: INFO: rcu_sched self-detected stall on CPU
...

I've attached the log from the KVM session and the config.gz used to
build the kernels.

- Ted

Attachment: config.gz
Description: application/gzip

Attachment: log.201910132216.gz
Description: application/gzip