Re: [PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

From: Waiman Long
Date: Fri Feb 08 2019 - 15:31:26 EST


On 02/08/2019 02:50 PM, Linus Torvalds wrote:
> On Thu, Feb 7, 2019 at 11:08 AM Waiman Long <longman@xxxxxxxxxx> wrote:
>> This patchset revamps the current rwsem-xadd implementation to make
>> it saner and easier to work with. This patchset removes all the
>> architecture specific assembly code and uses generic C code for all
>> architectures. This eases maintenance and enables us to enhance the
>> code more easily.
>>
>> This patchset also implements the following 3 new features:
>>
>> 1) Waiter lock handoff
>> 2) Reader optimistic spinning
>> 3) Store write-lock owner in the atomic count (x86-64 only)
> The patches are kind of hard to read, with most of them just doing
> prep-work that doesn't necessarily matter to the big picture.
>
> What I'd really like to see is
>
> (a) an overview of the new locking logic

The new locking logic is similar to qrwlock (see patch 11). Cmpxchg is
used to acquire the write lock, while xadd is still used for read lock.
Some of the bits in the count are also reserved for special purpose like
has waiter or lock handoff. Patch 15 tries to compress the write-lock
owner task pointer and put it into the count field for x86-64 at the
expense of less bits available for reader count. I have sent out an
additional patch this morning to make sure that the reader count won't
overflow.

In term of performance, there isn't much change with respect to
read-lock performance. For write-lock, I saw a slight drop in some
cases, but nothing significant. The merging of owner task pointer into
the count field does impose a slightly bigger drop than I would have
liked which I am going to look into a bit more.

>
> (b) what's the new fastpath case

The only change in the fastpath is the use of cmpxchg for writer lock.

>
> (c) some performance numbers

There are performance data at patches 11, 12, 15, 19, 20, 21. There was
performance data for patch 4 as well for eliminating the arch specific
file. Apparently, I might have deleted it accidentally. Anyway, no
noticeable performance difference was observed when switching to use
generic C code for x86, ppc and ARM64.

The major gain in performance is due to reader optimistic spinning
patches. The microbenchmark that I used shown an order of magnitude of
performance improvement for mixed reader-writer workloads. Of course, we
will see less performance gain with real world benchmarks.

I am planning to run more performance test and post the data sometimes
next week. Davidlohr is also going to run some of his rwsem performance
test on this patchset.

>
> to explain the changes from a "this is the point of the whole
> exercise" standpoint.
>
> And yes, I realize that the lock handoff and optimistic spinning is a
> big deal, since I've seen the same regression numbers that presumably
> caused this effort to be resurrected. So it's not that I don't find
> this intriguing and worthwhile, it's literally that I'd like a summary
> not so much of the individual patches, but of the new model.
>
> Please?

Maybe I should break this patchset into a few smaller ones to make it
easier to review. Any suggestion is welcome.

Cheers,
Longman