Re: [RFC] locking/rwsem: Avoid issuing wakeup before setting the reader waiter to nil

From: Waiman Long
Date: Thu Nov 29 2018 - 12:02:25 EST


On 11/29/2018 11:06 AM, Peter Zijlstra wrote:
> On Thu, Nov 29, 2018 at 10:21:58AM -0500, Waiman Long wrote:
>
>>>> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
>>>> index 09b1800..50d9af6 100644
>>>> --- a/kernel/locking/rwsem-xadd.c
>>>> +++ b/kernel/locking/rwsem-xadd.c
>>>> @@ -198,15 +198,22 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
>>>> woken++;
>>>> tsk = waiter->task;
>>>>
>>>> - wake_q_add(wake_q, tsk);
>>>> + get_task_struct(tsk);
>>>> list_del(&waiter->list);
>>>> /*
>>>> - * Ensure that the last operation is setting the reader
>>>> + * Ensure calling get_task_struct() before setting the reader
>>>> * waiter to nil such that rwsem_down_read_failed() cannot
>>>> * race with do_exit() by always holding a reference count
>>>> * to the task to wakeup.
>>>> */
>>>> smp_store_release(&waiter->task, NULL);
>>>> + /*
>>>> + * Ensure issuing the wakeup (either by us or someone else)
>>>> + * after setting the reader waiter to nil.
>>>> + */
>>>> + wake_q_add(wake_q, tsk);
>>>> + /* wake_q_add() already take the task ref */
>>>> + put_task_struct(tsk);
>>>> }
>>>>
>>>> adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
>> I doubt putting wake_q_add() after clearing waiter->task can really fix
> Why; at that point we know the wakeup will happen after, which is all we
> require.
>
>

Thread 1ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Thread 2ÂÂÂÂÂ Thread 3

ÂÂÂ rwsem_down_read_failed()
Âraw_spin_lock_irq(&sem->wait_lock);
Âlist_add_tail(&waiter.list, &wait_list);
Âraw_spin_unlock_irq(&sem->wait_lock);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ __rwsem_mark_wake();
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ wake_q_add();
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ wake_up_q();
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ waiter->task =
NULL; --+
Âwhile (true)
{ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ |
Â
set_current_state(TASK_UNINTERRUPTIBLE);ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
|
 if (!waiter.task) //
falseÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ |
ÂÂÂÂÂ
break;ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ |
Â
schedule();ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
|
Â}ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
<-----+
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ wake_up_q(&wake_q);

OK, I got confused by the thread racing chart shown in the patch. It
will be clearer if the clearing of waiter->task is moved down as shown.
Otherwise, moving the clearing of waiter->task before wake_q_add() won't
make a difference. So the patch can be a possible fix.

Still we are talking about 3 threads racing with each other. The
clearing of wake_q.next in wake_up_q() is not atomic and it is hard to
predict the racing result of the concurrent wake_q operations between
threads 2 and 3. The essence of my tentative patch is to prevent the
concurrent wake_q operations in the first place.

Cheers,
Longman



The second wake_q_add() above will fail to add the task to the second
wake_q because it is still in the first wake_q. So the second
wake_up_q() will not wake up the task because it is not in its wake_q.

Cheers,
Longman