Re: tree rcu: call_rcu scalability problem?

From: Paul E. McKenney
Date: Thu Sep 03 2009 - 01:14:37 EST


On Wed, Sep 02, 2009 at 09:17:44PM +0200, Peter Zijlstra wrote:
> On Wed, 2009-09-02 at 14:27 +0200, Nick Piggin wrote:
>
> > It seems like nearly 2/3 of the cost is here:
> > /* Add the callback to our list. */
> > *rdp->nxttail[RCU_NEXT_TAIL] = head; <<<
> > rdp->nxttail[RCU_NEXT_TAIL] = &head->next;
> >
> > In loading the pointer to the next tail pointer. If I'm reading the profile
> > correctly. Can't see why that should be a probem though...
> >
> > ffffffff8107dee0 <__call_rcu>: /* __call_rcu total: 320971 100.000 */
> > 697 0.2172 :ffffffff8107dee0: push %r12
>
> > 921 0.2869 :ffffffff8107df57: push %rdx
> > 151 0.0470 :ffffffff8107df58: popfq
> > 183507 57.1725 :ffffffff8107df59: mov 0x50(%rbx),%rax
> > 995 0.3100 :ffffffff8107df5d: mov %rdi,(%rax)
>
> I'd guess at popfq to be the expensive op here.. skid usually causes the
> attribution to be a few ops down the line.

I believe that Nick's workload is routinely driving the number of
callbacks queued on a given CPU above 10,000, which would provoke numerous
(and possibly inlined) calls to force_quiescent_state(). Like about
400,000 such calls per second. Hey, I was naively assuming that no one
would see more than 10,000 callbacks queued on a single CPU unless there
was some sort of major emergency underway, and coded accordingly. ;-)

I offer the attached experimental (untested, might not even compile) patch.

Thanx, Paul

------------------------------------------------------------------------