Re: [RFC] Deadlock via recursive wakeup via RCU with threadirqs
From: Joel Fernandes
Date: Thu Jun 27 2019 - 12:47:39 EST
On Thu, Jun 27, 2019 at 11:55 AM Paul E. McKenney <paulmck@xxxxxxxxxxxxx> wrote:
>
> On Thu, Jun 27, 2019 at 11:30:31AM -0400, Joel Fernandes wrote:
> > On Thu, Jun 27, 2019 at 10:34:55AM -0400, Steven Rostedt wrote:
> > > On Thu, 27 Jun 2019 10:24:36 -0400
> > > Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > >
> > > > > What am I missing here?
> > > >
> > > > This issue I think is
> > > >
> > > > (in normal process context)
> > > > spin_lock_irqsave(rq_lock); // which disables both preemption and interrupt
> > > > // but this was done in normal process context,
> > > > // not from IRQ handler
> > > > rcu_read_lock();
> > > > <---------- IPI comes in and sets exp_hint
> > >
> > > How would an IPI come in here with interrupts disabled?
> > >
> > > -- Steve
> >
> > This is true, could it be rcu_read_unlock_special() got called for some
> > *other* reason other than the IPI then?
> >
> > Per Sebastian's stack trace of the recursive lock scenario, it is happening
> > during cpu_acct_charge() which is called with the rq_lock held.
> >
> > The only other reasons I know off to call rcu_read_unlock_special() are if
> > 1. the tick indicated that the CPU has to report a QS
> > 2. an IPI in the middle of the reader section for expedited GPs
> > 3. preemption in the middle of a preemptible RCU reader section
>
> 4. Some previous reader section was IPIed or preempted, but either
> interrupts, softirqs, or preemption was disabled across the
> rcu_read_unlock() of that previous reader section.
Hi Paul, I did not fully understand 4. The previous RCU reader section
could not have been IPI'ed or been preempted if interrupts were
disabled across. Also, if softirq/preempt is disabled across the
previous reader section, the previous reader could not be preempted in
these case.
That leaves us with the only scenario where the previous reader was
IPI'ed while softirq/preempt was disabled across it. Is that what you
meant? But in this scenario, the previous reader should have set
exp_hint to false in the previous reader's rcu_read_unlock_special()
invocation itself. So I would think t->rcu_read_unlock_special should
be 0 during the new reader's invocation thus I did not understand how
rcu_read_unlock_special can be called because of a previous reader.
I'll borrow some of that confused color paint if you don't mind ;-)
And we should document this somewhere for future sanity preservation
:-D
thanks,
- Joel
>
> I -think- that this is what Sebastian is seeing.
>
> Thanx, Paul
>
> > 1. and 2. are not possible because interrupts are disabled, that's why the
> > wakeup_softirq even happened.
> > 3. is not possible because we are holding rq_lock in the RCU reader section.
> >
> > So I am at a bit of a loss how this can happen :-(
> >
> > Spurious call to rcu_read_unlock_special() may be when it should not have
> > been called?
> >
> > thanks,
> >
> > - Joel