Re: [RFC PATCH 1/1] epoll: use rwlock in order to reduce ep_poll_callback() contention

From: Jason Baron
Date: Wed Dec 05 2018 - 11:38:20 EST




On 12/5/18 6:16 AM, Roman Penyaev wrote:
> On 2018-12-04 18:23, Jason Baron wrote:
>> On 12/3/18 6:02 AM, Roman Penyaev wrote:
>
> [...]
>
>>>
>>> ÂÂÂÂ ep_set_busy_poll_napi_id(epi);
>>>
>>> @@ -1156,8 +1187,8 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
>>> ÂÂÂÂÂ */
>>> ÂÂÂÂ if (unlikely(ep->ovflist != EP_UNACTIVE_PTR)) {
>>> ÂÂÂÂÂÂÂÂ if (epi->next == EP_UNACTIVE_PTR) {
>>> -ÂÂÂÂÂÂÂÂÂÂÂ epi->next = ep->ovflist;
>>> -ÂÂÂÂÂÂÂÂÂÂÂ ep->ovflist = epi;
>>> +ÂÂÂÂÂÂÂÂÂÂÂ /* Atomically exchange tail */
>>> +ÂÂÂÂÂÂÂÂÂÂÂ epi->next = xchg(&ep->ovflist, epi);
>>
>> This also relies on the fact that the same epi can't be added to the
>> list in parallel, because the wait queue doing the wakeup will have the
>> wait_queue_head lock. That was a little confusing for me b/c we only had
>> the read_lock() above.
>
> Yes, that is indeed not obvious path, but wq is locked by wake_up_*_poll()
> call or caller of wake_up_locked_poll() has to be sure wq.lock is taken.
>
> I'll add an explicit comment here, thanks for pointing out.
>
>>
>>> ÂÂÂÂÂÂÂÂÂÂÂÂ if (epi->ws) {
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /*
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * Activate ep->ws since epi->ws may get
>>> @@ -1172,7 +1203,7 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
>>>
>>> ÂÂÂÂ /* If this file is already in the ready list we exit soon */
>>> ÂÂÂÂ if (!ep_is_linked(epi)) {
>>> -ÂÂÂÂÂÂÂ list_add_tail(&epi->rdllink, &ep->rdllist);
>>> +ÂÂÂÂÂÂÂ list_add_tail_lockless(&epi->rdllink, &ep->rdllist);
>>> ÂÂÂÂÂÂÂÂ ep_pm_stay_awake_rcu(epi);
>>> ÂÂÂÂ }
>>
>> same for this.
>
> ... and an explicit comment here.
>
>>
>>>
>>> @@ -1197,13 +1228,13 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
>>> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ break;
>>> ÂÂÂÂÂÂÂÂÂÂÂÂ }
>>> ÂÂÂÂÂÂÂÂ }
>>> -ÂÂÂÂÂÂÂ wake_up_locked(&ep->wq);
>>> +ÂÂÂÂÂÂÂ wake_up(&ep->wq);
>>
>> why the switch here to the locked() variant? Shouldn't the 'reader'
>> side, in this case which takes the rwlock for write see all updates in a
>> coherent state at this point?
>
> lockdep inside __wake_up_common expects wq_head->lock is taken, and
> seems this is not a good idea to leave wq naked on wake up path,
> when several CPUs can enter wake function. Although default_wake_function
> is protected by spinlock inside try_to_wake_up(), but for example
> autoremove_wake_function() can't be called concurrently for the same wq
> (it removes wq entry from the list). Also in case of bookmarks
> __wake_up_common adds an entry to the list, thus can't be called without
> any locks.
>
> I understand you concern and you are right saying that read side sees
> wq entries as stable, but that will work only if __wake_up_common does
> not modify anything, that is seems true now, but of course it is
> too scary to rely on that in the future.

I think it might be interesting for, at least testing, to see if not grabbing
wq.lock improves your benchmarks any further? fwiw, epoll only recently started
grabbing wq.lock bc lockdep required it.

Thanks,

-Jason