Re: [rfc 5/7] fs, epoll: Add procfs fdinfo helper
From: Matthew Helsley
Date: Thu Jul 19 2012 - 10:52:38 EST
On Wed, Jun 27, 2012 at 4:01 AM, Cyrill Gorcunov <gorcunov@xxxxxxxxxx> wrote:
> This allow us to print out eventpoll target file descriptor,
> events and data, the /proc/pid/fdinfo/fd consists of
>
> | pos: 0
> | flags: 02
> | tfd: 5 events: 1d data: ffffffffffffffff
>
> This feature is CONFIG_CHECKPOINT_RESTORE only.
>
> Signed-off-by: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
> CC: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> CC: Alexey Dobriyan <adobriyan@xxxxxxxxx>
> CC: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> CC: Pavel Emelyanov <xemul@xxxxxxxxxxxxx>
> CC: James Bottomley <jbottomley@xxxxxxxxxxxxx>
> ---
> fs/eventpoll.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 81 insertions(+)
>
> Index: linux-2.6.git/fs/eventpoll.c
> ===================================================================
> --- linux-2.6.git.orig/fs/eventpoll.c
> +++ linux-2.6.git/fs/eventpoll.c
> @@ -38,6 +38,8 @@
> #include <asm/io.h>
> #include <asm/mman.h>
> #include <linux/atomic.h>
> +#include <linux/proc_fs.h>
> +#include <linux/seq_file.h>
>
> /*
> * LOCKING:
> @@ -1897,6 +1899,83 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd,
> return error;
> }
>
> +#if defined(CONFIG_PROC_FS) && defined(CONFIG_CHECKPOINT_RESTORE)
> +
> +struct epitem_fdinfo {
> + struct epoll_event ev;
> + int fd;
> +};
> +
> +static struct epitem_fdinfo *
> +seq_lookup_fdinfo(struct proc_fdinfo_extra *extra, struct eventpoll *ep, loff_t num)
> +{
> + struct epitem_fdinfo *fdinfo = extra->priv;
> + struct epitem *epi = NULL;
> + struct rb_node *rbp;
> +
> + mutex_lock(&ep->mtx);
> + for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
> + if (num-- == 0) {
> + epi = rb_entry(rbp, struct epitem, rbn);
> + fdinfo->fd = epi->ffd.fd;
> + fdinfo->ev = epi->event;
> + break;
This will be incredibly slow. epoll was designed to scale to tens of
thousands of file descriptors. This algorithm is O(N^2) because each
time we show a new epoll item we walk through the whole rb tree again
(we're not doing a search so it isn't O(NlogN)).
Also, we could miss one or more later items if one of the earlier
items is removed from the epoll set in between "seq_lookup_fdinfo"
calls. This isn't a problem for checkpoint because we assume the task
(and everything with this eventpoll file in its fd table) is frozen.
However it means the file will be worse than useless for almost any
other purpose because they are unlikely to realize they need to freeze
all the task(s) to get consistent data.
Cheers,
-Matt Helsley
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/