Re: Warning in irq_work_queue_on()

From: Frederic Weisbecker
Date: Wed Sep 02 2015 - 17:50:29 EST


On Wed, Sep 02, 2015 at 03:44:05PM -0400, Tejun Heo wrote:
> (cc'ing peterz)
>
> Ooh, this is from irq_work which doesn't have much to do with
> workqueue. Peter?
>
> On Mon, Aug 24, 2015 at 05:16:11PM -0700, Paul E. McKenney wrote:
> > Hello, Tejun,
> >
> > As discussed last week, I am getting an occasional warning out of
> > irq_work_queue_on() WARN_ON_ONCE(cpu_is_offline(cpu)). The repeat-by
> > seems to be a week or so of rcutorture runs on 16-CPU KVM instances
> > on x86. So please see below on the off-chance that this is of use.
> > I have also attached a .config file.
> >
> > Thoughts?
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > [ 875.702254] ------------[ cut here ]------------
> > [ 875.703111] WARNING: CPU: 0 PID: 768 at /home/paulmck/public_git/bisect-linux-rcu/kernel/irq_work.c:69 irq_work_queue_on+0xd4/0x110()
> > [ 875.703227] Modules linked in:
> > [ 875.703227] CPU: 0 PID: 768 Comm: rcu_torture_rea Tainted: G W 4.1.0-rc4+ #1
> > [ 875.703227] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > [ 875.703227] ffffffff81baadd8 ffff88001dc5fce8 ffffffff81895418 00000000000000aa
> > [ 875.703227] 0000000000000000 ffff88001dc5fd28 ffffffff810517d5 0000000000015bc0
> > [ 875.703227] 0000000000000004 0000000000000004 ffff88001fc8f980 ffff88001fc8d500
> > [ 875.703227] Call Trace:
> > [ 875.703227] [<ffffffff81895418>] dump_stack+0x45/0x57
> > [ 875.703227] [<ffffffff810517d5>] warn_slowpath_common+0x85/0xc0
> > [ 875.703227] [<ffffffff810518b5>] warn_slowpath_null+0x15/0x20
> > [ 875.703227] [<ffffffff811119a4>] irq_work_queue_on+0xd4/0x110
> > [ 875.703227] [<ffffffff810c2d74>] tick_nohz_full_kick_cpu+0x44/0x50

It happens in nohz full, but I'm not sure the guilty is nohz full.

The problem here is that wake_up_nohz_cpu() selects a CPU that is offline.
But this shouldn't happen. Either it selects a CPU that is in the domain tree,
and I suspect offline CPUs aren't supposed to be there, or it selects the current
CPU. And if the CPU is offlined, it shouldn't be running some kthread...

> > [ 875.703227] [<ffffffff81076384>] wake_up_nohz_cpu+0xb4/0x100
> > [ 875.703227] [<ffffffff810b1196>] internal_add_timer+0x86/0xa0
> > [ 875.703227] [<ffffffff810b30f1>] mod_timer+0xf1/0x1e0
> > [ 875.703227] [<ffffffff810a63a4>] rcu_torture_reader+0x2a4/0x2e0
> > [ 875.703227] [<ffffffff810a63e0>] ? rcu_torture_reader+0x2e0/0x2e0
> > [ 875.703227] [<ffffffff810a6100>] ? rcutorture_trace_dump.part.10+0x20/0x20
> > [ 875.703227] [<ffffffff8106d75d>] kthread+0xcd/0xf0
> > [ 875.703227] [<ffffffff8106d690>] ? kthread_create_on_node+0x180/0x180
> > [ 875.703227] [<ffffffff8189fb92>] ret_from_fork+0x42/0x70
> > [ 875.703227] [<ffffffff8106d690>] ? kthread_create_on_node+0x180/0x180
> > [ 875.703227] ---[ end trace 74175128740d0113 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/