Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore
From: Paul E. McKenney
Date: Sat May 05 2018 - 01:23:59 EST
On Fri, May 04, 2018 at 11:38:37PM -0500, Eric W. Biederman wrote:
> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> writes:
>
> > On Fri, May 04, 2018 at 03:08:40PM -0500, Eric W. Biederman wrote:
> >> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> writes:
> >>
> >> > On Fri, May 04, 2018 at 02:03:04PM -0500, Eric W. Biederman wrote:
> >> >> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> writes:
> >> >>
> >> >> > On Fri, May 04, 2018 at 12:17:20PM -0500, Eric W. Biederman wrote:
> >> >> >> Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> writes:
> >> >> >>
> >> >> >> > On 2018-05-04 11:59:08 [-0500], Eric W. Biederman wrote:
> >> >> >> >> Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> writes:
> >> >> >> >> > From: Anna-Maria Gleixner <anna-maria@xxxxxxxxxxxxx>
> >> >> >> > â
> >> >> >> >> > This long-term fix has been made in commit 4abf91047cf ("rtmutex: Make >
> >> >> >> >> > wait_lock irq safe") for different reason.
> >> >> >> >>
> >> >> >> >> Which tree has this change been made in? I am not finding the commit
> >> >> >> >> you mention above in Linus's tree.
> >> >> >> >
> >> >> >> > I'm sorry, it should have been commit b4abf91047cf ("rtmutex: Make
> >> >> >> > wait_lock irq safe").
> >> >> >>
> >> >> >> Can you fix that in your patch description and can you also up the
> >> >> >> description of rcu_read_unlock?
> >> >> >>
> >> >> >> If we don't need to jump through hoops it looks very reasonable to
> >> >> >> remove this unnecessary logic. But we should fix the description
> >> >> >> in rcu_read_unlock that still says we need these hoops.
> >> >> >
> >> >> > The hoops are still required for rcu_read_lock(), otherwise you
> >> >> > get deadlocks between the scheduler and RCU in PREEMPT=y kernels.
> >> >> > What happens with this patch (if I understand it correctly) is that the
> >> >> > signal code now uses a different way of jumping through the hoops.
> >> >> > But the hoops are still jumped through.
> >> >>
> >> >> The patch changes:
> >> >>
> >> >> local_irq_disable();
> >> >> rcu_read_lock();
> >> >> spin_lock();
> >> >> rcu_read_unlock();
> >> >>
> >> >> to:
> >> >>
> >> >> rcu_read_lock();
> >> >> spin_lock_irq();
> >> >> rcu_read_unlock();
> >> >>
> >> >> Now that I have a chance to relfect on it the fact that the patern
> >> >> that is being restored does not work is scary. As the failure has
> >> >> nothing to do with lock ordering and people won't realize what is going
> >> >> on. Especially since the common rcu modes won't care.
> >> >>
> >> >> So is it true that taking spin_lock_irq before calling rcu_read_unlock
> >> >> is a problem because of rt_mutex_unlock()? Or has b4abf91047cf ("rtmutex: Make
> >> >> wait_lock irq safe") actually fixed that and we can correct the
> >> >> documentation of rcu_read_unlock() ? And fix __lock_task_sighand?
> >> >
> >> > The problem is that the thing taking the lock might be the scheduler,
> >> > or one of the locks taken while the scheduler's pi and rq locks are
> >> > held. This occurs only with RCU-preempt.
> >> >
> >> > Here is what can happen:
> >> >
> >> > o A task does rcu_read_lock().
> >> >
> >> > o That task is preempted.
> >> >
> >> > o That task stays preempted for a long time, and is therefore
> >> > priority boosted. This boosting involves a high-priority RCU
> >> > kthread creating an rt_mutex, pretending that the preempted task
> >> > already holds it, and then acquiring it.
> >> >
> >> > o The task awakens, acquires the scheduler's rq lock, and
> >> > then does rcu_read_unlock().
> >> >
> >> > o Because the task has been priority boosted, __rcu_read_unlock()
> >> > invokes the rcu_read_unlock_special() slowpath, which does
> >> > (as you say) rt_mutex_unlock() to deboost. The deboosting
> >> > can cause the scheduler to acquire the rq and pi locks, which
> >> > results in deadlock.
> >> >
> >> > In contrast, holding these scheduler locks across the entirety of the
> >> > RCU-preempt read-side critical section is harmless because then the
> >> > critical section cannot be preempted, which means that priority boosting
> >> > cannot happen, which means that there will be no need to deboost at
> >> > rcu_read_unlock() time.
> >> >
> >> > This restriction has not changed, and as far as I can see is inherent
> >> > in the fact that RCU uses the scheduler and the scheduler uses RCU.
> >> > There is going to be an odd corner case in there somewhere!
> >>
> >> However if I read things correctly b4abf91047cf ("rtmutex: Make
> >> wait_lock irq safe") did change this.
> >>
> >> In particular it changed things so that it is only the scheduler locks
> >> that matter, not any old lock that disabled interrupts. This was done
> >> by disabling disabling interrupts when taking the wait_lock.
> >>
> >> The rcu_read_unlock documentation states:
> >>
> >> * In most situations, rcu_read_unlock() is immune from deadlock.
> >> * However, in kernels built with CONFIG_RCU_BOOST, rcu_read_unlock()
> >> * is responsible for deboosting, which it does via rt_mutex_unlock().
> >> * Unfortunately, this function acquires the scheduler's runqueue and
> >> * priority-inheritance spinlocks. This means that deadlock could result
> >> * if the caller of rcu_read_unlock() already holds one of these locks or
> >> * any lock that is ever acquired while holding them; or any lock which
> >> * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
> >> * does not disable irqs while taking ->wait_lock.
> >>
> >> So we can now remove the clause:
> >> * ; or any lock which
> >> * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
> >> * does not disable irqs while taking ->wait_lock.
> >>
> >> Without the any lock that disabled interrupts restriction it is now safe
> >> to not worry about the issues with the scheduler locks and the rt_mutex
> >> Which does make it safe to not worry about these crazy complexities in
> >> lock_task_sighand.
> >>
> >> Paul do you agree or is the patch unsafe?
> >
> > Ah, I thought you were trying to get rid of all but the first line of
> > that paragraph, not just the final clause. Apologies for my confusion!
> >
> > It looks plausible, but the patch should be stress-tested on a preemptible
> > kernel with priority boosting enabled. Has that been done?
> >
> > (Me, I would run rcutorture scenario TREE03 for an extended time period
> > on b4abf91047cf with your patch applied.
And with lockdep enabled, which TREE03 does not do by default.
> > But what testing have you
> > done already?)
>
> Not my patch. I was just the reviewer asking for some obvious cleanups.
> So people won't get confused. Sebastian? Can you tell Paul what
> testing you have done?
Ah, apologies for my confusion!
Thanx, Paul