Re: WARN_ON_ONCE() in process_one_work()?

From: Paul E. McKenney
Date: Fri Jun 16 2017 - 13:37:13 EST


On Thu, Jun 15, 2017 at 08:38:57AM -0700, Paul E. McKenney wrote:
> On Wed, Jun 14, 2017 at 08:15:48AM -0700, Paul E. McKenney wrote:
> > On Tue, Jun 13, 2017 at 03:31:03PM -0700, Paul E. McKenney wrote:
> > > On Tue, Jun 13, 2017 at 04:58:37PM -0400, Tejun Heo wrote:
> > > > Hello, Paul.
> > > >
> > > > On Fri, May 05, 2017 at 10:11:59AM -0700, Paul E. McKenney wrote:
> > > > > Just following up... I have hit this bug a couple of times over the
> > > > > past few days. Anything I can do to help?
> > > >
> > > > My apologies for dropping the ball on this. I've gone over the hot
> > > > plug code in workqueue several times but can't really find how this
> > > > would happen. Can you please apply the following patch and see what
> > > > it says when the problem happens?
> > >
> > > I have fired it up, thank you!
> > >
> > > Last time I saw one failure in 21 hours of test runs, so I have kicked
> > > of 42 one-hour test runs. Will see what happens tomorrow morning,
> > > Pacific Time.
> >
> > And none of the 42 runs resulted in a workqueue splat. I will try again
> > this evening, Pacific Time.
> >
> > Who knows, maybe your diagnostic patch is the fix. ;-)
>
> And this time, we did get something! Here is the printk() output:
>
> [ 2126.863410] XXX workfn=vmstat_update pool->cpu/flags=1/0x0 curcpu=2 online=0-2,7 active=0,2,7
>
> Please see below for the full splat from dmesg.
>
> Please let me know if you need additional email. My test ID is KSIC
> 2017.06.14-15:50:08/TREE07.14, just to help me find it in my large pile
> of test results. ;-)

And no test failures from yesterday evening. So it looks like we get
somewhere on the order of one failure per 138 hours of TREE07 rcutorture
runtime with your printk() in the mix.

Was the above output from your printk() output of any help?

Thanx, Paul