Re: rcu_process_callbacks irqsoff latency caused by taking spinlock with irqs disabled
From: Paul E. McKenney
Date: Tue Apr 17 2018 - 11:43:10 EST
On Sun, Apr 08, 2018 at 02:06:18PM -0700, Paul E. McKenney wrote:
> On Sat, Apr 07, 2018 at 07:40:42AM +1000, Nicholas Piggin wrote:
> > On Thu, 5 Apr 2018 08:53:20 -0700
> > "Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
[ . . . ]
> > > > Note that rcu doesn't show up consistently at the top, this was
> > > > just one that looked *maybe* like it can be improved. So I don't
> > > > know how reproducible it is.
> > >
> > > Ah, that leads me to wonder whether the hypervisor preempted whoever is
> > > currently holding the lock. Do we have anything set up to detect that
> > > sort of thing?
> >
> > In this case it was running on bare metal, so it was a genuine latency
> > event. It just hasn't been consistently at the top (scheduler has been
> > there, but I'm bringing that down with tuning).
>
> OK, never mind about vCPU preemption, then! ;-)
>
> It looks like I will have other reasons to decrease rcu_node lock
> contention, so let me see what I can do.
And the intermittent contention behavior you saw makes is plausible
given the current code structure, which avoids contention in the common
case where grace periods follow immediately one after the other, but
does not in the less-likely case where RCU is idle and a bunch of CPUs
simultaneously see the need for a new grace period. I have a fix in
the works which occasionally actually makes it through rcutorture. ;-)
I expect to have something robust enough to post to LKML by the end
of this week.
Thanx, Paul