Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore

From: Eric W. Biederman
Date: Sat May 05 2018 - 00:39:05 EST

Next message: Dave Hansen: "Re: [PATCH v13 3/3] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey"
Previous message: Matthew Wilcox: "Re: *alloc API changes"
In reply to: Paul E. McKenney: "Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore"
Next in thread: Paul E. McKenney: "Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> writes:

> On Fri, May 04, 2018 at 03:08:40PM -0500, Eric W. Biederman wrote:
>> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> writes:
>>
>> > On Fri, May 04, 2018 at 02:03:04PM -0500, Eric W. Biederman wrote:
>> >> "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> writes:
>> >>
>> >> > On Fri, May 04, 2018 at 12:17:20PM -0500, Eric W. Biederman wrote:
>> >> >> Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> writes:
>> >> >>
>> >> >> > On 2018-05-04 11:59:08 [-0500], Eric W. Biederman wrote:
>> >> >> >> Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> writes:
>> >> >> >> > From: Anna-Maria Gleixner <anna-maria@xxxxxxxxxxxxx>
>> >> >> > â
>> >> >> >> > This long-term fix has been made in commit 4abf91047cf ("rtmutex: Make >
>> >> >> >> > wait_lock irq safe") for different reason.
>> >> >> >>
>> >> >> >> Which tree has this change been made in? I am not finding the commit
>> >> >> >> you mention above in Linus's tree.
>> >> >> >
>> >> >> > I'm sorry, it should have been commit b4abf91047cf ("rtmutex: Make
>> >> >> > wait_lock irq safe").
>> >> >>
>> >> >> Can you fix that in your patch description and can you also up the
>> >> >> description of rcu_read_unlock?
>> >> >>
>> >> >> If we don't need to jump through hoops it looks very reasonable to
>> >> >> remove this unnecessary logic. But we should fix the description
>> >> >> in rcu_read_unlock that still says we need these hoops.
>> >> >
>> >> > The hoops are still required for rcu_read_lock(), otherwise you
>> >> > get deadlocks between the scheduler and RCU in PREEMPT=y kernels.
>> >> > What happens with this patch (if I understand it correctly) is that the
>> >> > signal code now uses a different way of jumping through the hoops.
>> >> > But the hoops are still jumped through.
>> >>
>> >> The patch changes:
>> >>
>> >> local_irq_disable();
>> >> rcu_read_lock();
>> >> spin_lock();
>> >> rcu_read_unlock();
>> >>
>> >> to:
>> >>
>> >> rcu_read_lock();
>> >> spin_lock_irq();
>> >> rcu_read_unlock();
>> >>
>> >> Now that I have a chance to relfect on it the fact that the patern
>> >> that is being restored does not work is scary. As the failure has
>> >> nothing to do with lock ordering and people won't realize what is going
>> >> on. Especially since the common rcu modes won't care.
>> >>
>> >> So is it true that taking spin_lock_irq before calling rcu_read_unlock
>> >> is a problem because of rt_mutex_unlock()? Or has b4abf91047cf ("rtmutex: Make
>> >> wait_lock irq safe") actually fixed that and we can correct the
>> >> documentation of rcu_read_unlock() ? And fix __lock_task_sighand?
>> >
>> > The problem is that the thing taking the lock might be the scheduler,
>> > or one of the locks taken while the scheduler's pi and rq locks are
>> > held. This occurs only with RCU-preempt.
>> >
>> > Here is what can happen:
>> >
>> > o A task does rcu_read_lock().
>> >
>> > o That task is preempted.
>> >
>> > o That task stays preempted for a long time, and is therefore
>> > priority boosted. This boosting involves a high-priority RCU
>> > kthread creating an rt_mutex, pretending that the preempted task
>> > already holds it, and then acquiring it.
>> >
>> > o The task awakens, acquires the scheduler's rq lock, and
>> > then does rcu_read_unlock().
>> >
>> > o Because the task has been priority boosted, __rcu_read_unlock()
>> > invokes the rcu_read_unlock_special() slowpath, which does
>> > (as you say) rt_mutex_unlock() to deboost. The deboosting
>> > can cause the scheduler to acquire the rq and pi locks, which
>> > results in deadlock.
>> >
>> > In contrast, holding these scheduler locks across the entirety of the
>> > RCU-preempt read-side critical section is harmless because then the
>> > critical section cannot be preempted, which means that priority boosting
>> > cannot happen, which means that there will be no need to deboost at
>> > rcu_read_unlock() time.
>> >
>> > This restriction has not changed, and as far as I can see is inherent
>> > in the fact that RCU uses the scheduler and the scheduler uses RCU.
>> > There is going to be an odd corner case in there somewhere!
>>
>> However if I read things correctly b4abf91047cf ("rtmutex: Make
>> wait_lock irq safe") did change this.
>>
>> In particular it changed things so that it is only the scheduler locks
>> that matter, not any old lock that disabled interrupts. This was done
>> by disabling disabling interrupts when taking the wait_lock.
>>
>> The rcu_read_unlock documentation states:
>>
>> * In most situations, rcu_read_unlock() is immune from deadlock.
>> * However, in kernels built with CONFIG_RCU_BOOST, rcu_read_unlock()
>> * is responsible for deboosting, which it does via rt_mutex_unlock().
>> * Unfortunately, this function acquires the scheduler's runqueue and
>> * priority-inheritance spinlocks. This means that deadlock could result
>> * if the caller of rcu_read_unlock() already holds one of these locks or
>> * any lock that is ever acquired while holding them; or any lock which
>> * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
>> * does not disable irqs while taking ->wait_lock.
>>
>> So we can now remove the clause:
>> * ; or any lock which
>> * can be taken from interrupt context because rcu_boost()->rt_mutex_lock()
>> * does not disable irqs while taking ->wait_lock.
>>
>> Without the any lock that disabled interrupts restriction it is now safe
>> to not worry about the issues with the scheduler locks and the rt_mutex
>> Which does make it safe to not worry about these crazy complexities in
>> lock_task_sighand.
>>
>> Paul do you agree or is the patch unsafe?
>
> Ah, I thought you were trying to get rid of all but the first line of
> that paragraph, not just the final clause. Apologies for my confusion!
>
> It looks plausible, but the patch should be stress-tested on a preemptible
> kernel with priority boosting enabled. Has that been done?
>
> (Me, I would run rcutorture scenario TREE03 for an extended time period
> on b4abf91047cf with your patch applied. But what testing have you
> done already?)

Not my patch. I was just the reviewer asking for some obvious cleanups.
So people won't get confused. Sebastian? Can you tell Paul what
testing you have done?

Eric

Next message: Dave Hansen: "Re: [PATCH v13 3/3] mm, powerpc, x86: introduce an additional vma bit for powerpc pkey"
Previous message: Matthew Wilcox: "Re: *alloc API changes"
In reply to: Paul E. McKenney: "Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore"
Next in thread: Paul E. McKenney: "Re: [PATCH] kernel/signal: Remove no longer required irqsave/restore"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]