Re: [PATCH] locking/rwsem: Synchronize task state & waiter->task of readers
From: Waiman Long
Date: Mon Apr 23 2018 - 17:31:04 EST
On 04/23/2018 04:55 PM, Andrea Parri wrote:
> Hi Waiman,
>
> On Mon, Apr 23, 2018 at 12:46:12PM -0400, Waiman Long wrote:
>> On 04/10/2018 01:22 PM, Waiman Long wrote:
>>> It was observed occasionally in PowerPC systems that there was reader
>>> who had not been woken up but that its waiter->task had been cleared.
> Can you provide more details about these observations? (links to LKML
> posts, traces, applications used/micro-benchmarks, ...)
They were customers reported problems on the RHEL7 kernel. Actually, we
are not able to reproduce it in-house. From the symptom, it does look
like a memory synchronization issue which I believe is also there in the
upstream code.
>>> One probable cause of this missed wakeup may be the fact that the
>>> waiter->task and the task state have not been properly synchronized as
>>> the lock release-acquire pair of different locks in the wakeup code path
>>> does not provide a full memory barrier guarantee.
> I guess that by the "pair of different locks" you mean (sem->wait_lock,
> p->pi_lock), right? BTW, __rwsem_down_write_failed_common() is calling
> wake_up_q() _before_ releasing the wait_lock: did you intend to exclude
> this callsite? (why?)
Yes, I am talking about sem->wait_lock and p->pi_lock.
Right, there is an alternative reader wakeup path in
__rwsem_down_write_failed_common() iin addition to unlock wakeup. I
didn't purposely exclude that, but I think it has similar issue. I will
clarify that in the next version.
>
>> So smp_store_mb()
>>> is now used to set waiter->task to NULL to provide a proper memory
>>> barrier for synchronization.
> Mmh; the patch is not introducing an smp_store_mb()... My guess is that
> you are thinking at the sequence:
>
> smp_store_release(&waiter->task, NULL);
> [...]
> smp_mb(); /* added with your patch */
>
> or what am I missing?
I actually thought about doing that at the beginning. The reasons why I
changed my mind were:
1) If there are multiple readers ready to be woken up, you will have to
issue multiple smp_mb instead of just 1 here.
2) smp_store_mb() is implemented with a smp_mb after the store. So it
may not be able to ensure that setting the reader waiter to nil is the
last operation seen by other CPUs as noted in the comment above
smp_store_release().
For these 2 reasons, I decide to add an extra smp_mb at the end.
Cheers,
Longman