Re: [RFC] locking/rwsem: Avoid issuing wakeup before setting the reader waiter to nil

From: Waiman Long
Date: Thu Nov 29 2018 - 12:58:35 EST


On 11/29/2018 12:27 PM, Peter Zijlstra wrote:
> On Thu, Nov 29, 2018 at 12:02:19PM -0500, Waiman Long wrote:
>> On 11/29/2018 11:06 AM, Peter Zijlstra wrote:
>>> Why; at that point we know the wakeup will happen after, which is all we
>>> require.
>>>
>> Thread 1ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Thread 2ÂÂÂÂÂ Thread 3
>>
>> ÂÂÂ rwsem_down_read_failed()
>> Âraw_spin_lock_irq(&sem->wait_lock);
>> Âlist_add_tail(&waiter.list, &wait_list);
>> Âraw_spin_unlock_irq(&sem->wait_lock);
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ __rwsem_mark_wake();
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ wake_q_add();
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ wake_up_q();
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ waiter->task =
>> NULL; --+
>> Âwhile (true)
>> {ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ |
>> Â
>> set_current_state(TASK_UNINTERRUPTIBLE);ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
>> |
>> Â if (!waiter.task) //
>> falseÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ |
>> ÂÂÂÂÂ
>> break;ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ |
>> Â
>> schedule();ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
>> |
>> Â}ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
>> <-----+
>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ wake_up_q(&wake_q);
> I think that thing is horribly whitespace damanaged. At least, it's not
> making sense to me.

I am sorry. It was line-wrapped by my email client. The chart was

Thread 1ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Thread 2ÂÂÂÂÂ Thread 3
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ down_write();
rwsem_down_read_failed()
Âraw_spin_lock_irq(&sem->wait_lock);
Âlist_add_tail(&waiter.list, &wait_list);
Âraw_spin_unlock_irq(&sem->wait_lock);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ __rwsem_mark_wake();
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ wake_q_add();
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ wake_up_q();
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ waiter->task = NULL;-+
Âwhile (true) {ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ |
 set_current_state(TASK_UNINTERRUPTIBLE); |
 if (!waiter.task) // false |
ÂÂÂÂÂ break;ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ |
 schedule(); |
Â}ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ <-+
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ wake_up_q(&wake_q);

>> OK, I got confused by the thread racing chart shown in the patch. It
>> will be clearer if the clearing of waiter->task is moved down as shown.
>> Otherwise, moving the clearing of waiter->task before wake_q_add() won't
>> make a difference. So the patch can be a possible fix.
>>
>> Still we are talking about 3 threads racing with each other. The
>> clearing of wake_q.next in wake_up_q() is not atomic and it is hard to
>> predict the racing result of the concurrent wake_q operations between
>> threads 2 and 3. The essence of my tentative patch is to prevent the
>> concurrent wake_q operations in the first place.
> wake_up_q() should, per the barriers in wake_up_process, ensure that if
> wake_a_add() fails, there will be a wakeup of that task after that
> point.
>
> So if we put wake_up_q() at the location where wake_up_process() should
> be, it should all work.
>
> The bug in question is that it can happen at any time after
> wake_q_add(), not necessarily at wake_up_q().

OK, you convinced me. However, that can still lead to anonymous wakeups
that can be problematic if it happens in certain places. Should we try
to reduce anonymous wakeup as much as possible?

Cheers,
Longman