Hi Madars,--
On 11/30/2015 04:28 PM, Madars Vitolins wrote:
Hi Jason,
I today did search the mail archive and checked your offered patch did on February, it basically does the some (flag for add_wait_queue_exclusive() + balance).
So I plan to run off some tests with your patch, flag on/off and will provide results. I guess if I pull up 250 or 500 processes (which could real for production environment) waiting on one Q, then there could be a notable difference in performance with EPOLLEXCLUSIVE set or not.
Sounds good. Below is an updated patch if you want to try it - it only
adds the 'EPOLLEXCLUSIVE' flag.
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 1e009ca..265fa7b 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -92,7 +92,7 @@
*/
/* Epoll private bits inside the event mask */
-#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET)
+#define EP_PRIVATE_BITS (EPOLLWAKEUP | EPOLLONESHOT | EPOLLET | EPOLLEXCLUSIVE)
/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4
@@ -1002,6 +1002,7 @@ static int ep_poll_callback(wait_queue_t *wait,
unsigned mode, int sync, void *k
unsigned long flags;
struct epitem *epi = ep_item_from_wait(wait);
struct eventpoll *ep = epi->ep;
+ int ewake = 0;
if ((unsigned long)key & POLLFREE) {
ep_pwq_from_wait(wait)->whead = NULL;
@@ -1066,8 +1067,10 @@ static int ep_poll_callback(wait_queue_t *wait,
unsigned mode, int sync, void *k
* Wake up ( if active ) both the eventpoll wait list and the ->poll()
* wait list.
*/
- if (waitqueue_active(&ep->wq))
+ if (waitqueue_active(&ep->wq)) {
+ ewake = 1;
wake_up_locked(&ep->wq);
+ }
if (waitqueue_active(&ep->poll_wait))
pwake++;
@@ -1078,6 +1081,9 @@ out_unlock:
if (pwake)
ep_poll_safewake(&ep->poll_wait);
+ if (epi->event.events & EPOLLEXCLUSIVE)
+ return ewake;
+
return 1;
}
@@ -1095,7 +1101,10 @@ static void ep_ptable_queue_proc(struct file
*file, wait_queue_head_t *whead,
init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
pwq->whead = whead;
pwq->base = epi;
- add_wait_queue(whead, &pwq->wait);
+ if (epi->event.events & EPOLLEXCLUSIVE)
+ add_wait_queue_exclusive(whead, &pwq->wait);
+ else
+ add_wait_queue(whead, &pwq->wait);
list_add_tail(&pwq->llink, &epi->pwqlist);
epi->nwait++;
} else {
@@ -1861,6 +1870,10 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
if (f.file == tf.file || !is_file_epoll(f.file))
goto error_tgt_fput;
+ if ((epds.events & EPOLLEXCLUSIVE) && (op == EPOLL_CTL_MOD ||
+ (op == EPOLL_CTL_ADD && is_file_epoll(tf.file))))
+ goto error_tgt_fput;
+
/*
* At this point it is safe to assume that the "private_data" contains
* our own data structure.
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index bc81fb2..925bbfb 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -26,6 +26,9 @@
#define EPOLL_CTL_DEL 2
#define EPOLL_CTL_MOD 3
+/* Add exclusively */
+#define EPOLLEXCLUSIVE (1 << 28)
+
/*
* Request the handling of system wakeup events so as to prevent
system suspends
* from happening while those events are being processed.
During kernel hacking with debug print, with 10 processes waiting on one event source, with original kernel I did see lot un-needed processing inside of eventpoll.c, it got 10x calls to ep_poll_callback() and other stuff for single event, which results with few processes waken up in user space (count probably gets randomly depending on concurrency).
Meanwhile we are not the only ones who talk about this patch, see here: http://stackoverflow.com/questions/33226842/epollexclusive-and-epollroundrobin-flags-in-mainstream-kernel others are asking too.
So what is the current situation with your patch, what is the blocking for getting it into mainline?
If we can show some good test results here I will re-submit it.
Thanks,
-Jason