Re: [PATCH] rwsem: reduce spinlock contention in wakeup code path

From: Ingo Molnar
Date: Sat Sep 28 2013 - 03:41:59 EST



* Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:

> On Fri, Sep 27, 2013 at 12:00 PM, Waiman Long <Waiman.Long@xxxxxx> wrote:
> >
> > On a large NUMA machine, it is entirely possible that a fairly large
> > number of threads are queuing up in the ticket spinlock queue to do
> > the wakeup operation. In fact, only one will be needed. This patch
> > tries to reduce spinlock contention by doing just that.
> >
> > A new wakeup field is added to the rwsem structure. This field is set
> > on entry to rwsem_wake() and __rwsem_do_wake() to mark that a thread
> > is pending to do the wakeup call. It is cleared on exit from those
> > functions.
>
> Ok, this is *much* simpler than adding the new MCS spinlock, so I'm
> wondering what the performance difference between the two are.
>
> I'm obviously always in favor of just removing lock contention over
> trying to improve the lock scalability, so I really like Waiman's
> approach over Tim's new MCS lock. Not because I dislike MCS locks in
> general (or you, Tim ;), it's really more fundamental: I just
> fundamentally believe more in trying to avoid lock contention than in
> trying to improve lock behavior when that contention happens.
>
> As such, I love exactly these kinds of things that Wainman's patch does,
> and I'm heavily biased.

Yeah, I fully agree. The reason I'm still very sympathetic to Tim's
efforts is that they address a regression caused by a mechanic
mutex->rwsem conversion:

5a505085f043 mm/rmap: Convert the struct anon_vma::mutex to an rwsem

... and Tim's patches turn that regression into an actual speedup.

The main regression component happens to be more prominent on larger
systems which inevitably have higher contention. That was a result of
mutexes having better contention behavior than rwsems - which was not a
property Tim picked arbitrarily.

The other advantage would be that by factoring out the MCS code it gets
reviewed and seen more prominently - this resulted in a micro-optimization
already. So people working on mutex or rwsem scalability will have a
chance to help each other.

A third advantage would be that arguably our rwsems degrade on really big
systems catastrophically. We had that with spinlocks and mutexes and it
got fixd. Since rwsems are a long-term concern for us I thought that
something like MCS queueing could be considered a fundamental
quality-of-implementation requirement as well.

In any case I fully agree that we never want to overdo it, our priority of
optimization and our ranking will always be:

- lockless algorithms
- reduction of critical paths
...
- improved locking fast path
- improved locking slow path

and people will see much better speedups if they concentrate on entries
higher on this list.

It's a pretty rigid order as well: no slowpath improvement is allowed to
hurt any of the higher priority optimization concepts and people are
encouraged to always work on the higher order concepts before considering
the symptoms of contention.

But as long as contention behavior improvements are cleanly done, don't
cause regressions, do not hinder primary scalability efforts and the
numbers are as impressive as Tim's, it looks like good stuff to me.

I was thinking about putting them into the locking tree once the review
and testing process converges and wanted to send them to you in the v3.13
merge window.

The only potential downside (modulo bugs) I can see from Tim's patches is
XFS's heavy dependence on fine behavioral details of rwsem. But if done
right then none of these scalability efforts should impact those details.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/