Re: [lockdep] b09be676e0 BUG: unable to handle kernel NULL pointer dereference at 000001f2
From: Josh Poimboeuf
Date: Thu Oct 05 2017 - 10:54:23 EST
On Thu, Oct 05, 2017 at 08:01:46AM -0500, Josh Poimboeuf wrote:
> On Tue, Oct 03, 2017 at 09:54:31AM -0700, Linus Torvalds wrote:
> > On Tue, Oct 3, 2017 at 7:06 AM, Fengguang Wu <fengguang.wu@xxxxxxxxx> wrote:
> > >
> > > This patch triggers a NULL-dereference bug at update_stack_state().
> > > Although its parent commit also has a NULL-dereference bug, however
> > > the call stack looks rather different. Both dmesg files are attached.
> > >
> > > It also triggers this warning, which is being discussed in another
> > > thread, so CC Josh. The full dmesg attached, too.
> > >
> > > Please press Enter to activate this console.
> > > [ 138.605622] WARNING: kernel stack regs at be299c9a in procd:340 has bad 'bp' value 000001be
> > > [ 138.605627] unwind stack type:0 next_sp: (null) mask:0x2 graph_idx:0
> > > [ 138.605631] be299c9a: 299ceb00 (0x299ceb00)
> > > [ 138.605633] be299c9e: 2281f1be (0x2281f1be)
> > > [ 138.605634] be299ca2: 299cebb6 (0x299cebb6)
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > >
> > > commit b09be676e0ff25bd6d2e7637e26d349f9109ad75
> > > locking/lockdep: Implement the 'crossrelease' feature
> > Can we consider just reverting the crossrelease thing?
> > The apparent stack corruption really worries me, and what worries me
> > most is that commit wasn't even supposed to change anything as far as
> > I can tell - it only adds infrastructure, no actual users that *set*
> > the cross-lock thing.
> > So the fact that it actually seems to cause behavioural changes seems
> > to be _really_ scary, and indicates that the code is completely
> > broken.
> > Or am I missing something?
> So I gave crossrelease a bad rap here. Going back and looking at the
> panics and stack dumps, what I thought was "stack corruption" was
> actually the GCC unaligned stack pointer thing.
> I suspect those commits were implicated in the bisections because they
> started doing more stack traces in general, revealing some existing
> 32-bit unwinder/GCC/frame pointer bugs in the process.
> So I just wanted to clarify that crossrelease seems to be innocent in
> all this. Sorry for the confusion!
Ok, I may have spoken too soon :-)
There were so many issues here that it's been hard for me to untangle
There's one panic which seems different than the others:
BUG: unable to handle kernel NULL pointer dereference at 00000020
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
CPU: 0 PID: 29697 Comm: umount Not tainted 4.13.0-rc4-00169-gce07a941 #627
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
task: c0a0ba00 task.stack: c0a1e000
EFLAGS: 00010246 CPU: 0
EAX: 00000001 EBX: c0100218 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: c0a1fdd8 ESP: c0a1fdc0
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: 00000020 CR3: 10a03000 CR4: 00000690
EFLAGS: 00000292 CPU: 0
EAX: 00000000 EBX: 080960f0 ECX: a7f76ff4 EDX: 080960d0
ESI: 080960d0 EDI: 080960f0 EBP: afe22258 ESP: afe22208
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
Code: b5 20 56 3c b7 0f 84 64 fe ff ff e9 6b fe ff ff 8d b4 26 00 00 00 00 8b 7b 1c c1 ee 03 31 c9 83 05 ac 56 3c b7 01 83 e6 01 89 f2 <8b> 47 20 89 45 ec b8 c0 1f 30 b7 c7 04 24 00 00 00 00 e8 15 89
EIP: iput+0x544/0x650 SS:ESP: 0068:c0a1fdc0
---[ end trace 0bfc95b7cf7c8ea4 ]---
Kernel panic - not syncing: Fatal exception
And it was bisected to:
ce07a9415f26 ("locking/lockdep: Make check_prev_add() able to handle external stack_trace")
That commit hadn't added the crossrelease feature yet, so it presumably
didn't trigger the extra unwinder issues.
Peter and I found some issues with that patch, and Peter came up with a
fix. It would be good to know if Peter's patch makes that panic go
I've rebased the fixes on top of the ce07a9415f26 commit and attached
them to this email.
Fengguang, if you're still listening, could you please rerun the tests
on top of ce07a9415f26, with the attached patches also applied?