Re: [PATCH] epoll: add exclusive wakeups flag

From: Jason Baron
Date: Mon Mar 14 2016 - 18:35:17 EST


Hi Michael,

On 03/14/2016 05:03 PM, Michael Kerrisk (man-pages) wrote:
> Hi Jason,
>
> On 03/15/2016 09:01 AM, Michael Kerrisk (man-pages) wrote:
>> Hi Jason,
>>
>> On 03/15/2016 08:32 AM, Jason Baron wrote:
>>>
>>>
>>> On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
>>>> [Restoring CC, which I see I accidentally dropped, one iteration back.]
>
> [...]
>
>>>> Returning to the second sentence in this description:
>>>>
>>>> When a wakeup event occurs and multiple epoll file descripâ
>>>> tors are attached to the same target file using EPOLLEXCLUâ
>>>> SIVE, one or more of the epoll file descriptors will
>>>> receive an event with epoll_wait(2).
>>>>
>>>> There is a point that is unclear to me: what does "target file" refer to?
>>>> Is it an open file description (aka open file table entry) or an inode?
>>>> I suspect the former, but it was not clear in your original text.
>>>>
>>>
>>> So from epoll's perspective, the wakeups are associated with a 'wait
>>> queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
>>> file->poll()) results in adding to the same 'wait queue' then we will
>>> get 'exclusive' wakeup behavior.
>>>
>>> So in general, I think the answer here is that its associated with the
>>> inode (I coudn't say with 100% certainty without really looking at all
>>> file->poll() implementations). Certainly, with the 'FIFO' example below,
>>> the two scenarios will have the same behavior with respect to
>>> EPOLLEXCLUSIVE.
>
> So, I was actually a little surprised by this, and went away and tested
> this point. It appears to me that that the two scenarios described below
> do NOT have the same behavior with respect to EPOLLEXCLUSIVE. See below.
>
>> So, in both scenarios, *one or more* processes will get a wakeup?
>> (I'll try to add something to the text to clarify the detail we're
>> discussing.)
>>
>>> Also, the 'non-exclusive' mode would be subject to the same question of
>>> which wait queue is the epfd is associated with...
>>
>> I'm not sure of the point you are trying to make here?
>>
>> Cheers,
>>
>> Michael
>>
>>
>>>> To make this point even clearer, here are two scenarios I'm thinking of.
>>>> In each case, we're talking of monitoring the read end of a FIFO.
>>>>
>>>> ===
>>>>
>>>> Scenario 1:
>>>>
>>>> We have three processes each of which
>>>> 1. Creates an epoll instance
>>>> 2. Opens the read end of the FIFO
>>>> 3. Adds the read end of the FIFO to the epoll instance, specifying
>>>> EPOLLEXCLUSIVE
>>>>
>>>> When input becomes available on the FIFO, how many processes
>>>> get a wakeup?
>
> When I test this scenario, all three processes get a wakeup.
>
>>>> ===
>>>>
>>>> Scenario 3
>>>>
>>>> A parent process opens the read end of a FIFO and then calls
>>>> fork() three times to create three children. Each child then:
>>>>
>>>> 1. Creates an epoll instance
>>>> 2. Adds the read end of the FIFO to the epoll instance, specifying
>>>> EPOLLEXCLUSIVE
>>>>
>>>> When input becomes available on the FIFO, how many processes
>>>> get a wakeup?
>
> When I test this scenario, one process gets a wakeup.
>
> In other words, "target file" appears to mean open file description
> (aka open file table entry), not inode.
>
> This is actually what I suspected might be the case, but now I am
> puzzled. Given what I've discovered and what you suggest are the
> semantics, is the implementation correct? (I suspect that it is,
> but it is at odds with your statement above. My test programs are
> inline below.
>
> Cheers,
>
> Michael
>

Thanks for the test cases. So in your first test case, you are exiting
immediately after the epoll_wait() returns. So this is actually causing
the next wakeup. And then the 2nd thread returns from epoll_wait() and
this causes the 3rd wakeup.

So the wakeups are actually not happening from the write directly, but
instead from the readers doing a close(). If you do some sort of sleep
after the epoll_wait() you can confirm the behavior. So I believe this
is working as expected.

Thanks,

-Jason


> ============
>
> /* t_EPOLLEXCLUSIVE_multipen.c
>
> Licensed under GNU GPLv2 or later.
> */
> #include <sys/epoll.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
>
> #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
> } while (0)
>
> #define usageErr(msg, progName) \
> do { fprintf(stderr, "Usage: "); \
> fprintf(stderr, msg, progName); \
> exit(EXIT_FAILURE); } while (0)
>
> #ifndef EPOLLEXCLUSIVE
> #define EPOLLEXCLUSIVE (1 << 28)
> #endif
>
> int
> main(int argc, char *argv[])
> {
> int fd, epfd, nready;
> struct epoll_event ev, rev;
>
> if (argc != 2 || strcmp(argv[1], "--help") == 0)
> usageErr("%s <FIFO>n", argv[0]);
>
> epfd = epoll_create(2);
> if (epfd == -1)
> errExit("epoll_create");
>
> fd = open(argv[1], O_RDONLY);
> if (fd == -1)
> errExit("open");
> printf("Opened %s\n", argv[1]);
>
> ev.events = EPOLLIN | EPOLLEXCLUSIVE;
> if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
> errExit("epoll_ctl");
>
> nready = epoll_wait(epfd, &rev, 1, -1);
> if (nready == -1)
> errExit("epoll-wait");
> printf("epoll_wait() returned %d\n", nready);
>
> exit(EXIT_SUCCESS);
> }
>
> ===============
>
> /* t_EPOLLEXCLUSIVE_fork.c
>
> Licensed under GNU GPLv2 or later.
> */
>
> #include <sys/epoll.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/types.h>
> #include <sys/wait.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <string.h>
>
> #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
> } while (0)
>
> #define usageErr(msg, progName) \
> do { fprintf(stderr, "Usage: "); \
> fprintf(stderr, msg, progName); \
> exit(EXIT_FAILURE); } while (0)
>
> #ifndef EPOLLEXCLUSIVE
> #define EPOLLEXCLUSIVE (1 << 28)
> #endif
>
> int
> main(int argc, char *argv[])
> {
> int fd, epfd, nready;
> struct epoll_event ev, rev;
> int cnum;
>
> if (argc != 2 || strcmp(argv[1], "--help") == 0)
> usageErr("%s <FIFO>n", argv[0]);
>
> fd = open(argv[1], O_RDONLY);
> if (fd == -1)
> errExit("open");
> printf("Opened %s\n", argv[1]);
>
> for (cnum = 0; cnum < 3; cnum++) {
> switch (fork()) {
> case -1:
> errExit("fork");
>
> case 0: /* Child */
> epfd = epoll_create(2);
> if (epfd == -1)
> errExit("epoll_create");
>
> ev.events = EPOLLIN | EPOLLEXCLUSIVE;
> if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
> errExit("epoll_ctl");
>
> nready = epoll_wait(epfd, &rev, 1, -1);
> if (nready == -1)
> errExit("epoll-wait");
> printf("Child %d: epoll_wait() returned %d\n", cnum, nready);
> exit(EXIT_SUCCESS);
>
> default:
> break;
> }
> }
>
> wait(NULL);
> wait(NULL);
> wait(NULL);
>
> exit(EXIT_SUCCESS);
> }
>