Re: [PATCH v2] locking/rwsem: Take read lock immediate if queue empty with no writer
From: Waiman Long
Date: Wed Jul 18 2018 - 12:40:36 EST
On 07/18/2018 12:16 PM, Peter Zijlstra wrote:
> On Fri, Jul 13, 2018 at 02:30:53PM -0400, Waiman Long wrote:
>> It was discovered that a constant stream of readers might cause the
>> count to go negative most of the time after an initial trigger by a
>> writer even if no writer was present afterward. As a result, most of the
>> readers would have to go through the slowpath reducing their performance.
> I'm slightly confused, what happens to trigger this?
>
> (also, I'm forever confused by the whole BIAS scheme)
As I said before, an occasional writer among a stream of incoming
readers can cause this problem.
>
>> To avoid that from happening, an additional check is added to detect
>> the special case that the reader in the critical section is the only
>> one in the wait queue and no writer is present. When that happens, it
>> can just have the lock and return immediately without further action.
>> Other incoming readers won't see a waiter is present and be forced
>> into the slowpath.
The last sentence above is the key. It is what the other incoming
readers observe that cause the vicious cycle. See below.
>>
>> The additional code is in the slowpath and so should not have an impact
>> on rwsem performance. However, in the special case listed above, it may
>> greatly improve performance.
>
>> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
>> index 3064c50..bf0570e 100644
>> --- a/kernel/locking/rwsem-xadd.c
>> +++ b/kernel/locking/rwsem-xadd.c
>> @@ -233,8 +233,19 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
> your diff function thingy is busted, this is in fact
> __rwsem_down_read_failed_common you're patching.
Yes, I am patching __rwsem_down_read_failed_common().
>
>> waiter.type = RWSEM_WAITING_FOR_READ;
>>
>> raw_spin_lock_irq(&sem->wait_lock);
>> - if (list_empty(&sem->wait_list))
>> + if (list_empty(&sem->wait_list)) {
>> + /*
>> + * In the unlikely event that the task is the only one in
>> + * the wait queue and a writer isn't present, it can have
>> + * the lock and return immediately without going through
>> + * the remaining slowpath code.
>> + */
>> + if (unlikely(atomic_long_read(&sem->count) >= 0)) {
>> + raw_spin_unlock_irq(&sem->wait_lock);
>> + return sem;
>> + }
>> adjustment += RWSEM_WAITING_BIAS;
>> + }
>> list_add_tail(&waiter.list, &sem->wait_list);
> So without this, we would add ourselves to the list and then immediately
> call __rwsem_mark_wake() on ourselves and fall through the wait-loop, ow
> what?
The key here is that we don't want other incoming readers to observe
that there are waiters in the wait queue and hence have to go into the
slowpath until the single waiter in the queue is sure that it probably
will need to go to sleep if there is writer.
With a constant stream of incoming readers, a major portion of them will
observe the a negative count and be serialized to enter the slowpath.
There are certainly other readers that do not observe the negative count
in the in between period after one reader clear the count in the unlock
path and a waiter set the count to negative again. Those readers can go
ahead and do the read in parallel. But it is the serialized readers that
cause the performance loss and the observation of spinlock contention in
the perf output.
It is the constant stream of incoming readers that sustain the spinlock
queue and the repeated clearing and negative setting of the count.
I can reword the commit log and comment with a v3 patch if you think
that is helpful.
Cheers,
Longman