Re: [PATCH] epoll: add exclusive wakeups flag

From: Jason Baron
Date: Mon Mar 14 2016 - 15:32:25 EST




On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
> [Restoring CC, which I see I accidentally dropped, one iteration back.]
>
> Hi Jason,
>
> Thanks for the review. I've tweaked one piece to respond to your
> feedback. But I also have another new question below.
>
> On 03/15/2016 03:55 AM, Jason Baron wrote:
>> On 03/11/2016 06:25 PM, Michael Kerrisk (man-pages) wrote:
>>> On 03/11/2016 09:51 PM, Jason Baron wrote:
>>>> On 03/11/2016 03:30 PM, Michael Kerrisk (man-pages) wrote:
>
> [...]
>
>> Hi Michael,
>>
>> Looks good. One comment below.
>>
>> Thanks,
>>
>>> EPOLLEXCLUSIVE (since Linux 4.5)
>>> Sets an exclusive wakeup mode for the epoll file
>>> descriptor that is being attached to the target file
>>> descriptor, fd. When a wakeup event occurs and multiple
>>> epoll file descriptors are attached to the same target
>>> file using EPOLLEXCLUSIVE, one or more of the epoll file
>>> descriptors will receive an event with epoll_wait(2).
>>> The default in this scenario (when EPOLLEXCLUSIVE is not
>>> set) is for all epoll file descriptors to receive an
>>> event. EPOLLEXCLUSIVE is thus useful for avoiding thunâ
>>> dering herd problems in certain scenarios.
>>>
>>> If the same file descriptor is in multiple epoll
>>> instances, some with the EPOLLEXCLUSIVE flag, and others
>>> without, then events will provided to all epoll
>>> instances that did not specify EPOLLEXCLUSIVE, and at
>>> least one of the epoll instances that did specify
>>> EPOLLEXCLUSIVE.
>>>
>>> The following values may be specified in conjunction
>>> with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
>>> EPOLLET. EPOLLHUP and EPOLLERR can also be specified,
>>> but are ignored (as usual). Attempts to specify other
>>
>> I'm not sure 'ignored' is the right wording here. 'EPOLLHUP' and
>> 'EPOLERR' are always included in the set of events when something is
>> added as EPOLLEXCLUSIVE. This is consistent with the non-EPOLLEXCLUSIVE
>> add case.
>
> Yes.
>
>> So 'EPOLLHUP' and 'EPOLERR' may be specified but will be
>> included in the set of events on an add, whether they are specified or not.
>
> Yes. I understand your discomfort with the work "ignored", but the
> problem was that, because it made special mention of EPOLLHUP and EPOLLERR,
> your proposed text made it sound as though EPOLLEXCLUSIVE somehow was
> special with respect to these two flags. I wanted to clarify that it is not.
> How about this:
>
> The following values may be specified in conjunction
> with EPOLLEXCLUSIVE: EPOLLIN, EPOLLOUT, EPOLLWAKEUP, and
> EPOLLET. EPOLLHUP and EPOLLERR can also be specified,
> but this is not required: as usual, these events are
> always reported if they occur, regardless of whether
> they are specified in events.
> ?

Yes, nothing special here with respect to EPOLLHUP and EPOLLERR. So this
looks fine to me.

>
>>> values in events yield an error. EPOLLEXCLUSIVE may be
>>> used only in an EPOLL_CTL_ADD operation; attempts to
>>> employ it with EPOLL_CTL_MOD yield an error. If
>>> EPOLLEXCLUSIVE has set using epoll_ctl(2), then a subseâ
>>> quent EPOLL_CTL_MOD on the same epfd, fd pair yields an
> b>> error. An epoll_ctl(2) that specifies EPOLLEXCLUSIVE in
>>> events and specifies the target file descriptor fd as an
>>> epoll instance will likewise fail. The error in all of
>>> these cases is EINVAL.
>>>
>>> ERRORS
>>> EINVAL An invalid event type was specified along with EPOLLEXâ
>>> CLUSIVE in events.
>>>
>>> EINVAL op was EPOLL_CTL_MOD and events included EPOLLEXCLUSIVE.
>>>
>>> EINVAL op was EPOLL_CTL_MOD and the EPOLLEXCLUSIVE flag has
>>> previously been applied to this epfd, fd pair.
>>>
>>> EINVAL EPOLLEXCLUSIVE was specified in event and fd is refers
>>> to an epoll instance.
>
> Returning to the second sentence in this description:
>
> When a wakeup event occurs and multiple epoll file descripâ
> tors are attached to the same target file using EPOLLEXCLUâ
> SIVE, one or more of the epoll file descriptors will
> receive an event with epoll_wait(2).
>
> There is a point that is unclear to me: what does "target file" refer to?
> Is it an open file description (aka open file table entry) or an inode?
> I suspect the former, but it was not clear in your original text.
>

So from epoll's perspective, the wakeups are associated with a 'wait
queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
file->poll()) results in adding to the same 'wait queue' then we will
get 'exclusive' wakeup behavior.

So in general, I think the answer here is that its associated with the
inode (I coudn't say with 100% certainty without really looking at all
file->poll() implementations). Certainly, with the 'FIFO' example below,
the two scenarios will have the same behavior with respect to
EPOLLEXCLUSIVE.

Also, the 'non-exclusive' mode would be subject to the same question of
which wait queue is the epfd is associated with...

Thanks,

-Jason

> To make this point even clearer, here are two scenarios I'm thinking of.
> In each case, we're talking of monitoring the read end of a FIFO.
>
> ===
>
> Scenario 1:
>
> We have three processes each of which
> 1. Creates an epoll instance
> 2. Opens the read end of the FIFO
> 3. Adds the read end of the FIFO to the epoll instance, specifying
> EPOLLEXCLUSIVE
>
> When input becomes available on the FIFO, how many processes
> get a wakeup?
>
> ===
>
> Scenario 3
>
> A parent process opens the read end of a FIFO and then calls
> fork() three times to create three children. Each child then:
>
> 1. Creates an epoll instance
> 2. Adds the read end of the FIFO to the epoll instance, specifying
> EPOLLEXCLUSIVE
>
> When input becomes available on the FIFO, how many processes
> get a wakeup?
>
> ===
>
> Cheers,
>
> Michael
>