Re: [PATCH v3 1/2] seccomp: Add wait_killable semantic to seccomp user notifier

From: Kees Cook
Date: Fri Apr 29 2022 - 14:21:04 EST


On Fri, Apr 29, 2022 at 05:14:37PM +0000, Sargun Dhillon wrote:
> On Fri, Apr 29, 2022 at 11:42:15AM +0200, Rodrigo Campos wrote:
> > On Fri, Apr 29, 2022 at 4:32 AM Sargun Dhillon <sargun@xxxxxxxxx> wrote:
> > > the concept is searchable. If the notifying process is signaled prior
> > > to the notification being received by the userspace agent, it will
> > > be handled as normal.
> >
> > Why is that? Why not always handle in the same way (if wait killable
> > is set, wait like that)
> >
>
> The goal is to avoid two things:
> 1. Unncessary work - Often times, we see workloads that implement techniques
> like hedging (Also known as request racing[1]). In fact, RFC3484
> (destination address selection) gets implemented where the DNS library
> will connect to many backend addresses and whichever one comes back first
> "wins".
> 2. Side effects - We don't want a situation where a syscall is in progress
> that is non-trivial to rollback (mount), and from user space's perspective
> this syscall never completed.
>
> Blocking before the syscall even starts is excessive. When we looked at this
> we found that with runtimes like Golang, they can get into a bad situation
> if they have many (1000s) of threads that are in the middle of a syscall
> because all of them need to elide prior to GC. In this case the runtime
> prioritizes the liveness of GC vs. the syscalls.
>
> That being said, there may be some syscalls in a filter that need the suggested
> behaviour. I can imagine introducing a new flag
> (say SECCOMP_FILTER_FLAG_WAIT_KILLABLE) that applies to all states.
> Alternatively, in one implementation, I put the behaviour in the data
> field of the return from the BPF filter.

I'd add something like the above to the commit log, just to have it
around.

--
Kees Cook