Re: [4.14-rc1 x86] WARNING: kernel stack regs at f60bbb12 in swapper:1 has bad 'bp' value 0ba00000

From: Tetsuo Handa
Date: Tue Oct 03 2017 - 11:11:11 EST


Josh Poimboeuf wrote:
> On Tue, Oct 03, 2017 at 10:44:13PM +0900, Tetsuo Handa wrote:
> > Josh Poimboeuf wrote:
> >
> > > On Tue, Oct 03, 2017 at 12:37:44PM +0200, Borislav Petkov wrote:
> > > > On Tue, Oct 03, 2017 at 07:29:36PM +0900, Tetsuo Handa wrote:
> > > > > Tetsuo Handa wrote:
> > > > > > Tetsuo Handa wrote:
> > > > > > > Tetsuo Handa wrote:
> > > > > > > > I'm seeing below error between
> > > > > > > > 4898b99c261efe32 ("Merge tag 'acpi-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm") (git bisect good (presumably))
> > > > > > > > e6f3faa734a00c60 ("locking/lockdep: Fix workqueue crossrelease annotation") (git bisect bad) on linux.git .
> > > > > > >
> > > > > > > F.Y.I. This error remains as of 46c1e79fee417f15 ("Merge branch 'perf-urgent-for-linus' of
> > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip") on linux.git .
> > > > > > >
> > > > > >
> > > > > > This error still remains as of 6e80ecdddf4ea6f3 ("Merge branch 'libnvdimm-fixes'
> > > > > > of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm") on linux.git .
> > > > > >
> > > > > > I'm suspecting that this error is causing very unstable x86_32 kernel.
> > > > > > It seems that this error occurs (though rare frequency) even on x86_64 kernel.
> > > > > >
> > > > > > Nobody cares?
> > > > > >
> > > > > 4.14-rc3 still trivially panics due to this error. Is this problem known?
> > >
> > > Can you try with the following patch? It should hopefully give more
> > > useful information in the dump.
> > >
> > I see. Here is the result.
>
> Hm, that's not what I expected to happen... I suspect this is stack
> corruption, with the result being slightly different every time. Can
> you see if this patch fixes the panic?

This patch did not fix the problem. But disabling CONFIG_PROVE_LOCKING seems
to avoid this problem. Since "git log 4898b99c261efe32...e6f3faa734a00c60"
range includes lockdep changes, this might be a lockdep problem.

----------
# diff .config.old .config
2132c2132
< CONFIG_PROVE_LOCKING=y
---
> # CONFIG_PROVE_LOCKING is not set
2135,2136d2134
< CONFIG_LOCKDEP_CROSSRELEASE=y
< CONFIG_LOCKDEP_COMPLETIONS=y
2142d2139
< CONFIG_TRACE_IRQFLAGS=y
2157c2154
< CONFIG_PROVE_RCU=y
---
> # CONFIG_PROVE_RCU is not set
----------

Maybe there is a bug in completion and/or crossrelease handling?