Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore

From: Paul E. McKenney
Date: Fri May 04 2018 - 16:36:04 EST


On Fri, May 04, 2018 at 03:08:40PM -0500, Eric W. Biederman wrote:
> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> writes:
>
> > On Fri, May 04, 2018 at 02:03:04PM -0500, Eric W. Biederman wrote:
> >> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> writes:
> >>
> >> > On Fri, May 04, 2018 at 12:17:20PM -0500, Eric W. Biederman wrote:
> >> >> Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> writes:
> >> >>
> >> >> > On 2018-05-04 11:59:08 [-0500], Eric W. Biederman wrote:
> >> >> >> Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> writes:
> >> >> >> > From: Anna-Maria Gleixner <anna-maria@xxxxxxxxxxxxx>
> >> >> > â
> >> >> >> > This long-term fix has been made in commit 4abf91047cf ("rtmutex: Make >
> >> >> >> > wait_lock irq safe") for different reason.
> >> >> >>
> >> >> >> Which tree has this change been made in? I am not finding the commit
> >> >> >> you mention above in Linus's tree.
> >> >> >
> >> >> > I'm sorry, it should have been commit b4abf91047cf ("rtmutex: Make
> >> >> > wait_lock irq safe").
> >> >>
> >> >> Can you fix that in your patch description and can you also up the
> >> >> description of rcu_read_unlock?
> >> >>
> >> >> If we don't need to jump through hoops it looks very reasonable to
> >> >> remove this unnecessary logic. But we should fix the description
> >> >> in rcu_read_unlock that still says we need these hoops.
> >> >
> >> > The hoops are still required for rcu_read_lock(), otherwise you
> >> > get deadlocks between the scheduler and RCU in PREEMPT=y kernels.
> >> > What happens with this patch (if I understand it correctly) is that the
> >> > signal code now uses a different way of jumping through the hoops.
> >> > But the hoops are still jumped through.
> >>
> >> The patch changes:
> >>
> >> local_irq_disable();
> >> rcu_read_lock();
> >> spin_lock();
> >> rcu_read_unlock();
> >>
> >> to:
> >>
> >> rcu_read_lock();
> >> spin_lock_irq();
> >> rcu_read_unlock();
> >>
> >> Now that I have a chance to relfect on it the fact that the patern
> >> that is being restored does not work is scary. As the failure has
> >> nothing to do with lock ordering and people won't realize what is going
> >> on. Especially since the common rcu modes won't care.
> >>
> >> So is it true that taking spin_lock_irq before calling rcu_read_unlock
> >> is a problem because of rt_mutex_unlock()? Or has b4abf91047cf ("rtmutex: Make
> >> wait_lock irq safe") actually fixed that and we can correct the
> >> documentation of rcu_read_unlock() ? And fix __lock_task_sighand?
> >
> > The problem is that the thing taking the lock might be the scheduler,
> > or one of the locks taken while the scheduler's pi and rq locks are
> > held. This occurs only with RCU-preempt.
> >
> > Here is what can happen:
> >
> > o A task does rcu_read_lock().
> >
> > o That task is preempted.
> >
> > o That task stays preempted for a long time, and is therefore
> > priority boosted. This boosting involves a high-priority RCU
> > kthread creating an rt_mutex, pretending that the preempted task
> > already holds it, and then acquiring it.
> >
> > o The task awakens, acquires the scheduler's rq lock, and
> > then does rcu_read_unlock().
> >
> > o Because the task has been priority boosted, __rcu_read_unlock()
> > invokes the rcu_read_unlock_special() slowpath, which does
> > (as you say) rt_mutex_unlock() to deboost. The deboosting
> > can cause the scheduler to acquire the rq and pi locks, which
> > results in deadlock.
> >
> > In contrast, holding these scheduler locks across the entirety of the
> > RCU-preempt read-side critical section is harmless because then the
> > critical section cannot be preempted, which means that priority boosting
> > cannot happen, which means that there will be no need to deboost at
> > rcu_read_unlock() time.
> >
> > This restriction has not changed, and as far as I can see is inherent
> > in the fact that RCU uses the scheduler and the scheduler uses RCU.
> > There is going to be an odd corner case in there somewhere!
>
> However if I read things correctly b4abf91047cf ("rtmutex: Make
> wait_lock irq safe") did change this.
>
> In particular it changed things so that it is only the scheduler locks
> that matter, not any old lock that disabled interrupts. This was done
> by disabling disabling interrupts when taking the wait_lock.
>
> The rcu_read_unlock documentation states:
>
> * In most situations, rcu_read_unlock() is immune from deadlock.
> * However, in kernels built with CONFIG_RCU_BOOST, rcu_read_unlock()
> * is responsible for deboosting, which it does via rt_mutex_unlock().
> * Unfortunately, this function acquires the scheduler's runqueue and
> * priority-inheritance spinlocks. This means that deadlock could result
> * if the caller of rcu_read_unlock() already holds one of these locks or
> * any lock that is ever acquired while holding them; or any lock which
> * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
> * does not disable irqs while taking ->wait_lock.
>
> So we can now remove the clause:
> * ; or any lock which
> * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
> * does not disable irqs while taking ->wait_lock.
>
> Without the any lock that disabled interrupts restriction it is now safe
> to not worry about the issues with the scheduler locks and the rt_mutex
> Which does make it safe to not worry about these crazy complexities in
> lock_task_sighand.
>
> Paul do you agree or is the patch unsafe?

Ah, I thought you were trying to get rid of all but the first line of
that paragraph, not just the final clause. Apologies for my confusion!

It looks plausible, but the patch should be stress-tested on a preemptible
kernel with priority boosting enabled. Has that been done?

(Me, I would run rcutorture scenario TREE03 for an extended time period
on b4abf91047cf with your patch applied. But what testing have you
done already?)

Thanx, Paul