Re: [PATCH] fs: Improve eventpoll logging to stop indicting timerfd

From: Manish Varma
Date: Mon Mar 22 2021 - 13:15:54 EST


Hi Thomas,

On Thu, Mar 18, 2021 at 6:04 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> Manish,
>
> On Mon, Mar 01 2021 at 19:49, Manish Varma wrote:
>
> > All together, that will give us names like the following:
> >
> > 1) timerfd file descriptor: [timerfd14:system_server]
> > 2) eventpoll top-level per-process wakesource: epoll:system_server
> > 3) eventpoll-on-timerfd per-descriptor wakesource:
> > epollitem:system_server.[timerfd14:system_server]
>
> All together that should be splitted up into a change to eventpoll and
> timerfd.
>

Noted.

> > diff --git a/fs/timerfd.c b/fs/timerfd.c
> > index c5509d2448e3..4249e8c9a38c 100644
> > --- a/fs/timerfd.c
> > +++ b/fs/timerfd.c
> > @@ -46,6 +46,8 @@ struct timerfd_ctx {
> > bool might_cancel;
> > };
> >
> > +static atomic_t instance_count = ATOMIC_INIT(0);
>
> instance_count is misleading as it does not do any accounting of
> instances as the name suggests.
>

Not sure if I am missing a broader point here, but the objective of this
patch is to:
A. To help find the process a given timerfd associated with, and
B. one step further, if there are multiple fds created by a single
process then label each instance using monotonically increasing integer
i.e. "instance_count" to help identify each of them separately.

So, instance_count in my mind helps with "B", i.e. to keep track and
identify each instance of timerfd individually.

> > static LIST_HEAD(cancel_list);
> > static DEFINE_SPINLOCK(cancel_lock);
> >
> > @@ -391,6 +393,9 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
> > {
> > int ufd;
> > struct timerfd_ctx *ctx;
> > + char task_comm_buf[sizeof(current->comm)];
> > + char file_name_buf[32];
> > + int instance;
> >
> > /* Check the TFD_* constants for consistency. */
> > BUILD_BUG_ON(TFD_CLOEXEC != O_CLOEXEC);
> > @@ -427,7 +432,11 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
> >
> > ctx->moffs = ktime_mono_to_real(0);
> >
> > - ufd = anon_inode_getfd("[timerfd]", &timerfd_fops, ctx,
> > + instance = atomic_inc_return(&instance_count);
> > + get_task_comm(task_comm_buf, current);
>
> How is current->comm supposed to be unique? And with a wrapping counter
> like the above you can end up with identical file descriptor names.
>
> What's wrong with simply using the PID which is guaranteed to be unique
> for the life time of a process/task?
>

Thanks, and Yes, on a second thought, PID sounds like a better option.
I will address in v2 patch.

> > + snprintf(file_name_buf, sizeof(file_name_buf), "[timerfd%d:%s]",
> > + instance, task_comm_buf);
> > + ufd = anon_inode_getfd(file_name_buf, &timerfd_fops, ctx,
> > O_RDWR | (flags & TFD_SHARED_FCNTL_FLAGS));
> > if (ufd < 0)
> > kfree(ctx);
>
> I actually wonder, whether this should be part of anon_inode_get*().
>

I am curious (and open at the same time) if that will be helpful..
In the case of timerfd, I could see it adds up value by stuffing more
context to the file descriptor name as eventpoll is using the same file
descriptor names as wakesource name, and hence the cost of slightly
longer file descriptor name justifies. But I don't have a solid reason
if this additional cost (of longer file descriptor names) will be
helpful in general with other file descriptors.

> Aside of that this is a user space visible change both for eventpoll and
> timerfd.
>
> Have you carefully investigated whether there is existing user space
> which might depend on the existing naming conventions?
>

I am not sure how I can confirm that for all userspace, but open for
suggestions if you can share some ideas.

However, I have verified and can confirm for Android userspace that
there is no dependency on existing naming conventions for timerfd and
eventpoll wakesource names, if that helps.

> The changelog is silent about this...
>

Noted - will add this into the revised patch.

> Thanks,
>
> tglx

Thanks,
Manish