[seccomp] Request for a "enable on execve" mode for Seccomp filters
From: Camille Mougey
Date: Wed Oct 28 2020 - 22:16:39 EST
Hello,
(This is my first message to the kernel list, I hope I'm doing it right)
>From my understanding, there is no way to delay the activation of
seccomp filters, for instance "until an _execve_ call".
But this might be useful, especially for tools who sandbox other,
non-cooperative, executables, such as "systemd" or "FireJail".
It seems to be a caveat of seccomp specific to the system call
_execve_. For now, some tools such as "systemd" explicitly mention
this exception, and do not support it (from the man page):
> Note that strict system call filters may impact execution and error handling code paths of the service invocation. Specifically, access to the execve system call is required for the execution of the service binary — if it is blocked service invocation will necessarily fail
"FireJail" takes a different approach[1], with a kind of workaround:
the project uses an external library to be loaded through LD_PRELOAD
mechanism, in order to install filters during the loader stage.
This approach, a bit hacky, also has several caveats:
* _openat_, _mmap_, etc. must be allowed in order to reach the
LD_PRELOAD mechanism, and for the crafted library to work ;
* it doesn't work for static binaries.
I only see hackish ways to restrict the use of _execve_ in a
non-cooperative executable. These methods seem globally bypassables
and not satisfactory from a security point of view.
IMHO, a way to prepare filter and enable them only on the next
_execve_ would have some benefit:
* have a way to restrict _execve_ in a non-cooperative executable;
* install filters atomically, ie. before the _execve_ system call
return. That would limit racy situations, and have the very firsts
instructions of potentially untrusted binaries already subject to
seccomp filters. It would also ensure there is only one thread running
at the filter enabling time.
>From what I understand, there is a relative use case[2] where the
"enable on exec" mode would also be a solution.
Thanks for your attention,
C. Mougey
[1]: https://github.com/netblue30/firejail/issues/3685
[2]: https://lore.kernel.org/linux-man/202010250759.F9745E0B6@keescook/