Re: Linux 4.9.6 ( Restore IO-APIC irq_chip retrigger callback , breaks my box )

From: Mike Galbraith
Date: Sun Feb 12 2017 - 21:33:12 EST


On Mon, 2017-02-13 at 02:26 +0100, Gabriel C wrote:

> [ 5.276704] CPU0
> [ 5.312400] ----
> [ 5.347605] lock(tick_broadcast_lock);
> [ 5.383163]
> [ 5.418457] lock(tick_broadcast_lock);
> [ 5.454015]
> *** DEADLOCK ***
>
> [ 5.557982] no locks held by cpuhp/0/14.

Oh, that looks familiar...

tick/broadcast: Make tick_broadcast_control() use raw_spinlock_irqsave()

Otherwise we end up with the lockdep splat below:

[ 12.703619] =================================
[ 12.703619] [ INFO: inconsistent lock state ]
[ 12.703621] 4.10.0-rt1-rt #18 Not tainted
[ 12.703622] ---------------------------------
[ 12.703623] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[ 12.703624] cpuhp/0/23 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 12.703625] (tick_broadcast_lock){?.....}, at: [<ffffffff81123f9a>] tick_broadcast_control+0x5a/0x1a0
[ 12.703632] {IN-HARDIRQ-W} state was registered at:
[ 12.703637] [<ffffffff810e3511>] __lock_acquire+0xa21/0x1550
[ 12.703639] [<ffffffff810e444d>] lock_acquire+0xbd/0x250
[ 12.703642] [<ffffffff81703723>] _raw_spin_lock_irqsave+0x53/0x70
[ 12.703644] [<ffffffff811240f6>] tick_broadcast_switch_to_oneshot+0x16/0x50
[ 12.703646] [<ffffffff811244d9>] tick_switch_to_oneshot+0x59/0xd0
[ 12.703647] [<ffffffff811245e5>] tick_init_highres+0x15/0x20
[ 12.703652] [<ffffffff811134cf>] hrtimer_run_queues+0x9f/0xe0
[ 12.703654] [<ffffffff81111155>] run_local_timers+0x25/0x60
[ 12.703656] [<ffffffff811111bc>] update_process_times+0x2c/0x60
[ 12.703659] [<ffffffff8112256f>] tick_periodic+0x2f/0x100
[ 12.703661] [<ffffffff81122664>] tick_handle_periodic+0x24/0x70
[ 12.703664] [<ffffffff810485d3>] local_apic_timer_interrupt+0x33/0x60
[ 12.703669] [<ffffffff81706f18>] smp_apic_timer_interrupt+0x38/0x50
[ 12.703671] [<ffffffff81705f2d>] apic_timer_interrupt+0x9d/0xb0
[ 12.703672] [<ffffffff81702b54>] mwait_idle+0x94/0x290
[ 12.703676] [<ffffffff8102862f>] arch_cpu_idle+0xf/0x20
[ 12.703677] [<ffffffff817030c1>] default_idle_call+0x31/0x60
[ 12.703681] [<ffffffff810d7575>] do_idle+0x175/0x290
[ 12.703683] [<ffffffff810d79b8>] cpu_startup_entry+0x48/0x50
[ 12.703687] [<ffffffff81046833>] start_secondary+0x133/0x160
[ 12.703689] [<ffffffff810001c4>] verify_cpu+0x0/0xfc
[ 12.703690] irq event stamp: 71
[ 12.703691] hardirqs last enabled at (71): [<ffffffff8170376c>] _raw_spin_unlock_irq+0x2c/0x80
[ 12.703696] hardirqs last disabled at (70): [<ffffffff816ff0dc>] __schedule+0x9c/0x7e0
[ 12.703699] softirqs last enabled at (0): [<ffffffff8107ba11>] copy_process.part.34+0x5f1/0x22d0
[ 12.703700] softirqs last disabled at (0): [< (null)>] (null)
[ 12.703701]
[ 12.703701] other info that might help us debug this:
[ 12.703701] Possible unsafe locking scenario:
[ 12.703701]
[ 12.703701] CPU0
[ 12.703702] ----
[ 12.703702] lock(tick_broadcast_lock);
[ 12.703703] <Interrupt>
[ 12.703704] lock(tick_broadcast_lock);
[ 12.703705]
[ 12.703705] *** DEADLOCK ***
[ 12.703705]
[ 12.703705] no locks held by cpuhp/0/23.
[ 12.703705]
[ 12.703705] stack backtrace:
[ 12.703707] CPU: 0 PID: 23 Comm: cpuhp/0 Not tainted 4.10.0-rt1-rt #18
[ 12.703708] Hardware name: Hewlett-Packard ProLiant DL980 G7, BIOS P66 07/07/2010
[ 12.703709] Call Trace:
[ 12.703715] dump_stack+0x85/0xc8
[ 12.703717] print_usage_bug+0x1ea/0x1fb
[ 12.703719] ? print_shortest_lock_dependencies+0x1c0/0x1c0
[ 12.703721] mark_lock+0x20d/0x290
[ 12.703723] __lock_acquire+0x8e6/0x1550
[ 12.703724] ? __lock_acquire+0x2ce/0x1550
[ 12.703726] ? load_balance+0x1b4/0xaf0
[ 12.703728] lock_acquire+0xbd/0x250
[ 12.703729] ? tick_broadcast_control+0x5a/0x1a0
[ 12.703735] ? efifb_probe+0x170/0x170
[ 12.703736] _raw_spin_lock+0x3b/0x50
[ 12.703737] ? tick_broadcast_control+0x5a/0x1a0
[ 12.703738] tick_broadcast_control+0x5a/0x1a0
[ 12.703740] ? efifb_probe+0x170/0x170
[ 12.703742] intel_idle_cpu_online+0x22/0x100
[ 12.703744] cpuhp_invoke_callback+0x245/0x9d0
[ 12.703747] ? finish_task_switch+0x78/0x290
[ 12.703750] ? check_preemption_disabled+0x9f/0x130
[ 12.703752] cpuhp_thread_fun+0x52/0x110
[ 12.703754] smpboot_thread_fn+0x276/0x320
[ 12.703757] kthread+0x10c/0x140
[ 12.703759] ? smpboot_update_cpumask_percpu_thread+0x130/0x130
[ 12.703760] ? kthread_park+0x90/0x90
[ 12.703762] ret_from_fork+0x2a/0x40
[ 12.709790] intel_idle: lapic_timer_reliable_states 0x2

Signed-off-by: Mike Galbraith <efault@xxxxxx>
---
kernel/time/tick-broadcast.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -357,6 +357,7 @@ void tick_broadcast_control(enum tick_br
struct clock_event_device *bc, *dev;
struct tick_device *td;
int cpu, bc_stopped;
+ unsigned long flags;

td = this_cpu_ptr(&tick_cpu_device);
dev = td->evtdev;
@@ -370,7 +371,7 @@ void tick_broadcast_control(enum tick_br
if (!tick_device_is_functional(dev))
return;

- raw_spin_lock(&tick_broadcast_lock);
+ raw_spin_lock_irqsave(&tick_broadcast_lock, flags);
cpu = smp_processor_id();
bc = tick_broadcast_device.evtdev;
bc_stopped = cpumask_empty(tick_broadcast_mask);
@@ -420,7 +421,7 @@ void tick_broadcast_control(enum tick_br
tick_broadcast_setup_oneshot(bc);
}
}
- raw_spin_unlock(&tick_broadcast_lock);
+ raw_spin_unlock_irqrestore(&tick_broadcast_lock, flags);
}
EXPORT_SYMBOL_GPL(tick_broadcast_control);