Re: [PATCH RESEND 2/5] seccomp: Add wait_killable semantic to seccomp user notifier

From: Sargun Dhillon
Date: Wed Apr 28 2021 - 13:14:14 EST


On Wed, Apr 28, 2021 at 7:08 AM Tycho Andersen <tycho@tycho.pizza> wrote:
>
> On Wed, Apr 28, 2021 at 03:20:02PM +0200, Rodrigo Campos wrote:
> > On Wed, Apr 28, 2021 at 1:10 PM Rodrigo Campos <rodrigo@xxxxxxxxxx> wrote:
> > >
> > > On Wed, Apr 28, 2021 at 2:22 AM Tycho Andersen <tycho@tycho.pizza> wrote:
> > > >
> > > > On Tue, Apr 27, 2021 at 04:19:54PM -0700, Andy Lutomirski wrote:
> > > > > User notifiers should allow correct emulation. Right now, it doesn't,
> > > > > but there is no reason it can't.
> > > >
> > > > Thanks for the explanation.
> > > >
> > > > Consider fsmount, which has a,
> > > >
> > > > ret = mutex_lock_interruptible(&fc->uapi_mutex);
> > > > if (ret < 0)
> > > > goto err_fsfd;
> > > >
> > > > If a regular task is interrupted during that wait, it return -EINTR
> > > > or whatever back to userspace.
> > > >
> > > > Suppose that we intercept fsmount. The supervisor decides the mount is
> > > > OK, does the fsmount, injects the mount fd into the container, and
> > > > then the tracee receives a signal. At this point, the mount fd is
> > > > visible inside the container. The supervisor gets a notification about
> > > > the signal and revokes the mount fd, but there was some time where it
> > > > was exposed in the container, whereas with the interrupt in the native
> > > > syscall there was never any exposure.
> > >
> > > IIUC, this is solved by my patch, patch 4 of the series. The
> > > supervisor should do the addfd with the flag added in that patch
> > > (SECCOMP_ADDFD_FLAG_SEND) for an atomic "addfd + send".
> >
> > Well, under Andy's proposal handling that is even simpler. If the
> > signal is delivered after we added the fd (note that the container
> > syscall does not return when the signal arrives, as it happens today,
> > it just signals the notifier and continues to wait), we can just
> > ignore the signal and return that (if that is the appropriate thing
> > for that syscall, but I guess after adding an fd there isn't any other
> > reasonable thing to do).
>
> Yes, agreed. After thinking about this more, my example is bogus: the
> kernel doesn't sleep after it installs the fd, so it would ignore any
> signals too.
>
> Even if the kernel *did* sleep after installing the fd, it would still
> be correct emulation to install it and then do whatever the kernel did
> during that sleep. So I withdraw my objection :)
>
> Thanks,
>
> Tycho

Great.

I'll respin the series and add a
SECCOMP_IOCTL_NOTIF_SET_WAIT_KILLABLE command.

We can do the other aforementioned optimizations above when
specific use cases come up. I would like to work on preemption
notification after this lands though.