Re: [RFC v7 1/2] epoll: Implement eventpoll_replace_file()

From: Christian Brauner
Date: Wed May 24 2023 - 13:26:37 EST


On Wed, May 24, 2023 at 06:39:32AM +0000, aloktiagi wrote:
> Introduce a mechanism to replace a file linked in the epoll interface with a new
> file.
>
> eventpoll_replace() finds all instances of the file to be replaced and replaces
> them with the new file and the interested events.h

I've spent a bit more time on this and I have a few more
questions/thoughts.

* What if the seccomp notifier replaces a pollable file with a
non-pollable file? Right now you check that the file is pollable and
if it isn't you're not updating the file references in the epoll
instance for the file descriptor you updated. Why these semantics and
not e.g., removing all instances of that file referring to the updated
fd?

* What if the seccomp notifier replaces the file of a file descriptor
with an epoll file descriptor? If the fd and original file are present
in an epoll instance does that mean you add the epoll file into all
epoll instances? That also means you create nested epoll instances
which are supported but are subject to various limitations. What's the
plan?

* What if you have two threads in the same threadgroup that each have a
seccomp listener profile attached to them. Both have the same fd open.

Now both replace the same fd concurrently. Both threads concurrently
update references in the epoll instances now since the spinlock and
mutex are acquired and reacquired again. Afaict, you can end up with
some instances of the fd temporarily generating events for file1 and
other instances generating events for file2 while the replace is in
progress. Thus generating spurious events and userspace might be
acting on a file descriptor that doesn't yet refer to the new file?
That's possibly dangerous.

Maybe I'm mistaken but if so I'd like to hear the details why that
can't happen.

Thinking about it what if the same file is registered via multiple fds
at the same time? Can't you end up in a scenario where you have the
same fd referring to different files in one or multiple epoll
instance?

I mean, you can get into that situation via dup2() where you change
the file descriptor to refer to a different file but the fd might
still be registered in the epoll instance referring to the old file
provided there's another fd open holding the old file alive.

The difference though is that userspace must've been dumb enough to
actually do that whereas now this can just happen behind their back
misleading them.

Honestly, the kernel can't give you any atomicity in replacing these
references and if so it would require new possibly invasive locking
that would very likely not be acceptable upstream just for the sake of
this feature. I still have a very hard time seeing any of this
happening.

* I haven't looked at the codepath that tries to restore the old file on
failure. That might introduce even more weirdness.