Re: timers: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected

From: Thomas Gleixner
Date: Wed Jan 13 2016 - 04:06:55 EST


Sasha,

On Tue, 12 Jan 2016, Sasha Levin wrote:

Cc'ing Paul, Peter

> While fuzzing with trinity inside a KVM tools guest, running the latest -next
> kernel, I've hit the following lockdep warning:

> [ 3408.474461] Possible interrupt unsafe locking scenario:
>
> [ 3408.474461]
>
> [ 3408.475239] CPU0 CPU1
>
> [ 3408.475809] ---- ----
>
> [ 3408.476380] lock(&lock->wait_lock);
>
> [ 3408.476925] local_irq_disable();
>
> [ 3408.477640] lock(&(&new_timer->it_lock)->rlock);
>
> [ 3408.478607] lock(&lock->wait_lock);

That comes from rcu_read_unlock:

rcu_read_unlock()
rcu_read_unlock_special()
...
rt_mutex_unlock(&rnp->boost_mtx);
raw_spin_lock(&boost_mtx->wait_lock);

> [ 3408.479445] <Interrupt>
>
> [ 3408.479796] lock(&(&new_timer->it_lock)->rlock);

So the task on CPU0 holds rnp->boost_mtx.wait_lock and then the interrupt
deadlocks on the timer->it_lock.

We can fix that particular issue in the posix-timer code by making the
locking symetric:

rcu_read_lock();
spin_lock_irq(timer->lock);

...

spin_unlock_irq(timer->lock);
rcu_read_unlock();

instead of:

rcu_read_lock();
spin_lock_irq(timer->lock);
rcu_read_unlock();

...

spin_unlock_irq(timer->lock);

But the question is, whether this is the only offending code path in tree. We
can avoid the hassle by making rtmutex->wait_lock irq safe.

Thoughts?

Thanks,

tglx