Re: WARN_ON_ONCE() in process_one_work()?

From: Paul E. McKenney
Date: Sat Jun 17 2017 - 13:31:17 EST


On Sat, Jun 17, 2017 at 07:53:14AM -0400, Tejun Heo wrote:
> Hello,
>
> On Fri, Jun 16, 2017 at 10:36:58AM -0700, Paul E. McKenney wrote:
> > And no test failures from yesterday evening. So it looks like we get
> > somewhere on the order of one failure per 138 hours of TREE07 rcutorture
> > runtime with your printk() in the mix.
> >
> > Was the above output from your printk() output of any help?
>
> Yeah, if my suspicion is correct, it'd require new kworker creation
> racing against CPU offline, which would explain why it's so difficult
> to repro. Can you please see whether the following patch resolves the
> issue?

That could explain why only Steve Rostedt and I saw the issue. As far
as I know, we are the only ones who regularly run CPU-hotplug stress
tests. ;-)

I have a weekend-long run going, but will give this a shot overnight on
Monday, Pacific Time. Thank you for putting it together, looking forward
to seeing what it does!

Thanx, Paul

> Thanks.
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 803c3bc274c4..1500217ce4b4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -980,8 +980,13 @@ struct migration_arg {
> static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
> struct task_struct *p, int dest_cpu)
> {
> - if (unlikely(!cpu_active(dest_cpu)))
> - return rq;
> + if (p->flags & PF_KTHREAD) {
> + if (unlikely(!cpu_online(dest_cpu)))
> + return rq;
> + } else {
> + if (unlikely(!cpu_active(dest_cpu)))
> + return rq;
> + }
>
> /* Affinity changed (again). */
> if (!cpumask_test_cpu(dest_cpu, &p->cpus_allowed))
>