Re: [PATCH v3 1/2] seccomp: Add wait_killable semantic to seccomp user notifier
From: Sargun Dhillon
Date: Mon May 02 2022 - 12:04:25 EST
On Mon, May 02, 2022 at 04:15:07PM +0200, Rodrigo Campos wrote:
> On Fri, Apr 29, 2022 at 4:32 AM Sargun Dhillon <sargun@xxxxxxxxx> wrote:
> > diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation/userspace-api/seccomp_filter.rst
> > index 539e9d4a4860..204cf5ba511a 100644
> > --- a/Documentation/userspace-api/seccomp_filter.rst
> > +++ b/Documentation/userspace-api/seccomp_filter.rst
> > @@ -271,6 +271,14 @@ notifying process it will be replaced. The supervisor can also add an FD, and
> > respond atomically by using the ``SECCOMP_ADDFD_FLAG_SEND`` flag and the return
> > value will be the injected file descriptor number.
> >
> > +The notifying process can be preempted, resulting in the notification being
> > +aborted. This can be problematic when trying to take actions on behalf of the
> > +notifying process that are long-running and typically retryable (mounting a
> > +filesytem). Alternatively, the at filter installation time, the
> > +``SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV`` flag can be set. This flag makes it
> > +such that when a user notification is received by the supervisor, the notifying
> > +process will ignore non-fatal signals until the response is sent.
>
> Maybe:
>
> This flags ignores non-fatal signals that arrive after the supervisor
> received the notification
>
> I mean, I want to make it clear that if a signal arrives before the
> notification was received by the supervisor, then it will be
> interrupted anyways.
>
>
Added: Signals that are sent prior to the notification being received by
userspace are handled normally.
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index db10e73d06e0..9291b0843cb2 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -1485,6 +1512,9 @@ static long seccomp_notify_recv(struct seccomp_filter *filter,
> > mutex_lock(&filter->notify_lock);
> > knotif = find_notification(filter, unotif.id);
> > if (knotif) {
> > + /* Reset the process to make sure it's not stuck */
> > + if (should_sleep_killable(filter, knotif))
> > + complete(&knotif->ready);
> > knotif->state = SECCOMP_NOTIFY_INIT;
> > up(&filter->notif->request);
>
> (I couldn't git-am this locally, so maybe I'm injecting this at the
> wrong parts mentally when looking at the other code for more context.
> Sorry if that is the case :))
>
> Why do we need to complete() only in this error path? As far as I can
> see this is on the error path where the copy to userspace failed and
> we want to reset this notification.
This error path acts as follows
(Say, S: Supervisor, P: Notifying Process, U: User)
P: 2 <-- Pid
P: getppid() // Generated notification
P: Waiting in wait_interruptible state
S: Calls receive notification, and the codepath gets up to the poin
where it's copying the notification to userspace
U: kill -SIGURG 2 // Send a kill signal to the notifying process
P: Waiting in the wait_killable state
S: Kernel fails to copy notification into user memory, and resets
the notification and returns an error
If we do not have the reset, the P will never return to wait interruptible.
This is the only code path that a notification can go init -> sent -> init.
>
> I think that is wrong, we want to wake up the other side not just on
> the error path, but on the non-error path (in fact, do we want to do
> this on the error path? It seems like a no-op, but don't see any
> reason to do it).
>
It's unneccessary. Why do it? It just means we wake up a process without reason.
Wakeups are expensive.
> We _need_ to call complete() in the non error path here so the other
> side wakes up and switches to a killable wait. As we are not doing
> this (for the non error path), this will basically not achieve a
> wait_killable() at all.
>
No, because here, we check if we were waiting interruptible, and
then we switch to wait_killable:
/*
* Check to see if the notifcation got picked up and
* whether we should switch to wait killable.
*/
if (!wait_killable && should_sleep_killable(match, &n))
continue;
This could probably be:
if (fatal_signal_pending(current))
break;
if (!wait_killable && should_sleep_killable(match, &n))
continue;
But, that check for fatal_signal_pending seems to be unneccessary (or something we'll get
for free in the next iteration).
> I think this was probably an oversight adapting the patch from last
> year. Is it possble? Because it seems that in the previous version we
> sent last year[1] (if you can link them next time it will be way
> simpler :)) you had a new ioctl() and the call to complete() was
> handled there, in seccomp_notify_set_wait_killable(). Now, as this is
> part of the filter (and as I said last year, I think this way looks
> better) that call to complete() was completely forgotten.
>
> Is it possible that this is not really working as intended, then? Am I
> missing something?
>
>
> Best,
> Rodrigo
>
>
> [1]: https://lore.kernel.org/all/20210430204939.5152-3-sargun@xxxxxxxxx/