Re: [PATCH] epoll: add exclusive wakeups flag

From: Madars Vitolins
Date: Mon Mar 14 2016 - 19:22:19 EST


Hi Jason and Michael,

Hmm... I tried to play with those pipe samples bellow, but even with sleep I got that all process wakeups (maybe I miss something too), also tried with EPOLLIN.

On same bases I created sample with Posix Queues with EPOLLIN | EPOLLEXCLUSIVE and the goods news are that it works correctly.

file q.c:
==================
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/epoll.h>
#include <fcntl.h>
#include <sys/wait.h>
#include <errno.h>
#include <mqueue.h>

#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)

#define usageErr(msg, progName) \
do { fprintf(stderr, "Usage: "); \
fprintf(stderr, msg, progName); \
exit(EXIT_FAILURE); } while (0)

#ifndef EPOLLEXCLUSIVE
#define EPOLLEXCLUSIVE (1 << 28)
#endif

#define MAX_SIZE 10

int
main (int argc, char *argv[])
{
int epfd, nready;
struct epoll_event ev, rev;
mqd_t fd;
struct mq_attr attr;
char buffer[MAX_SIZE + 1];
int cnum;

/* initialize the queue attributes */
attr.mq_flags = 0;
attr.mq_maxmsg = 5;
attr.mq_msgsize = MAX_SIZE;
attr.mq_curmsgs = 0;

/* cleanup for multiple runs... */
mq_unlink ("/TESTQ");

/* create the message queue */
fd =
mq_open ("/TESTQ", O_CREAT | O_RDWR | O_NONBLOCK, S_IWUSR | S_IRUSR,
&attr);
if (fd == -1)
errExit ("open");

for (cnum = 0; cnum < 3; cnum++)
{
switch (fork ())
{
case -1:
errExit ("fork");

case 0: /* Child */
epfd = epoll_create (2);
if (epfd == -1)
errExit ("epoll_create");

ev.events = EPOLLIN | EPOLLEXCLUSIVE;
if (epoll_ctl (epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
errExit ("epoll_ctl");

printf ("About to wait...\n");
nready = epoll_wait (epfd, &rev, 1, -1);
if (nready == -1)
errExit ("epoll-wait");

printf ("Child %d: epoll_wait() returned %d\n", cnum, nready);
exit (EXIT_SUCCESS);

default:
break;
}
}
sleep (1);
/* send a msq to Q */
memset (buffer, 0, MAX_SIZE);
if (0 > mq_send (fd, buffer, MAX_SIZE, 0))
errExit ("mq_send");
printf ("msg sent ok...\n");

wait (NULL);
wait (NULL);
wait (NULL);

exit (EXIT_SUCCESS);
}
==================

$ gcc q.c -lrt
$ ./a.out
About to wait...
About to wait...
About to wait...
msg sent ok...
Child 2: epoll_wait() returned 1
^C
$



Best regards,
Madars


Jason Baron @ 2016-03-15 00:35 rakstÄja:
Hi Michael,

On 03/14/2016 05:03 PM, Michael Kerrisk (man-pages) wrote:
Hi Jason,

On 03/15/2016 09:01 AM, Michael Kerrisk (man-pages) wrote:
Hi Jason,

On 03/15/2016 08:32 AM, Jason Baron wrote:


On 03/14/2016 01:47 PM, Michael Kerrisk (man-pages) wrote:
[Restoring CC, which I see I accidentally dropped, one iteration back.]

[...]

Returning to the second sentence in this description:

When a wakeup event occurs and multiple epoll file descripâ
tors are attached to the same target file using EPOLLEXCLUâ
SIVE, one or more of the epoll file descriptors will
receive an event with epoll_wait(2).

There is a point that is unclear to me: what does "target file" refer to?
Is it an open file description (aka open file table entry) or an inode?
I suspect the former, but it was not clear in your original text.


So from epoll's perspective, the wakeups are associated with a 'wait
queue'. So if the open() and subsequent EPOLL_CTL_ADD (which is done via
file->poll()) results in adding to the same 'wait queue' then we will
get 'exclusive' wakeup behavior.

So in general, I think the answer here is that its associated with the
inode (I coudn't say with 100% certainty without really looking at all
file->poll() implementations). Certainly, with the 'FIFO' example below,
the two scenarios will have the same behavior with respect to
EPOLLEXCLUSIVE.

So, I was actually a little surprised by this, and went away and tested
this point. It appears to me that that the two scenarios described below
do NOT have the same behavior with respect to EPOLLEXCLUSIVE. See below.

So, in both scenarios, *one or more* processes will get a wakeup?
(I'll try to add something to the text to clarify the detail we're
discussing.)

Also, the 'non-exclusive' mode would be subject to the same question of
which wait queue is the epfd is associated with...

I'm not sure of the point you are trying to make here?

Cheers,

Michael


To make this point even clearer, here are two scenarios I'm thinking of.
In each case, we're talking of monitoring the read end of a FIFO.

===

Scenario 1:

We have three processes each of which
1. Creates an epoll instance
2. Opens the read end of the FIFO
3. Adds the read end of the FIFO to the epoll instance, specifying
EPOLLEXCLUSIVE

When input becomes available on the FIFO, how many processes
get a wakeup?

When I test this scenario, all three processes get a wakeup.

===

Scenario 3

A parent process opens the read end of a FIFO and then calls
fork() three times to create three children. Each child then:

1. Creates an epoll instance
2. Adds the read end of the FIFO to the epoll instance, specifying
EPOLLEXCLUSIVE

When input becomes available on the FIFO, how many processes
get a wakeup?

When I test this scenario, one process gets a wakeup.

In other words, "target file" appears to mean open file description
(aka open file table entry), not inode.

This is actually what I suspected might be the case, but now I am
puzzled. Given what I've discovered and what you suggest are the
semantics, is the implementation correct? (I suspect that it is,
but it is at odds with your statement above. My test programs are
inline below.

Cheers,

Michael


Thanks for the test cases. So in your first test case, you are exiting
immediately after the epoll_wait() returns. So this is actually causing
the next wakeup. And then the 2nd thread returns from epoll_wait() and
this causes the 3rd wakeup.

So the wakeups are actually not happening from the write directly, but
instead from the readers doing a close(). If you do some sort of sleep
after the epoll_wait() you can confirm the behavior. So I believe this
is working as expected.

Thanks,

-Jason


============

/* t_EPOLLEXCLUSIVE_multipen.c

Licensed under GNU GPLv2 or later.
*/
#include <sys/epoll.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/types.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)

#define usageErr(msg, progName) \
do { fprintf(stderr, "Usage: "); \
fprintf(stderr, msg, progName); \
exit(EXIT_FAILURE); } while (0)

#ifndef EPOLLEXCLUSIVE
#define EPOLLEXCLUSIVE (1 << 28)
#endif

int
main(int argc, char *argv[])
{
int fd, epfd, nready;
struct epoll_event ev, rev;

if (argc != 2 || strcmp(argv[1], "--help") == 0)
usageErr("%s <FIFO>n", argv[0]);

epfd = epoll_create(2);
if (epfd == -1)
errExit("epoll_create");

fd = open(argv[1], O_RDONLY);
if (fd == -1)
errExit("open");
printf("Opened %s\n", argv[1]);

ev.events = EPOLLIN | EPOLLEXCLUSIVE;
if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
errExit("epoll_ctl");

nready = epoll_wait(epfd, &rev, 1, -1);
if (nready == -1)
errExit("epoll-wait");
printf("epoll_wait() returned %d\n", nready);

exit(EXIT_SUCCESS);
}

===============

/* t_EPOLLEXCLUSIVE_fork.c

Licensed under GNU GPLv2 or later.
*/

#include <sys/epoll.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>

#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \
} while (0)

#define usageErr(msg, progName) \
do { fprintf(stderr, "Usage: "); \
fprintf(stderr, msg, progName); \
exit(EXIT_FAILURE); } while (0)

#ifndef EPOLLEXCLUSIVE
#define EPOLLEXCLUSIVE (1 << 28)
#endif

int
main(int argc, char *argv[])
{
int fd, epfd, nready;
struct epoll_event ev, rev;
int cnum;

if (argc != 2 || strcmp(argv[1], "--help") == 0)
usageErr("%s <FIFO>n", argv[0]);

fd = open(argv[1], O_RDONLY);
if (fd == -1)
errExit("open");
printf("Opened %s\n", argv[1]);

for (cnum = 0; cnum < 3; cnum++) {
switch (fork()) {
case -1:
errExit("fork");

case 0: /* Child */
epfd = epoll_create(2);
if (epfd == -1)
errExit("epoll_create");

ev.events = EPOLLIN | EPOLLEXCLUSIVE;
if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1)
errExit("epoll_ctl");

nready = epoll_wait(epfd, &rev, 1, -1);
if (nready == -1)
errExit("epoll-wait");
printf("Child %d: epoll_wait() returned %d\n", cnum, nready);
exit(EXIT_SUCCESS);

default:
break;
}
}

wait(NULL);
wait(NULL);
wait(NULL);

exit(EXIT_SUCCESS);
}