Re: [PATCH v1 0/6] seccomp: Implement constant action bitmaps

From: Jann Horn
Date: Thu Sep 24 2020 - 10:06:52 EST


On Thu, Sep 24, 2020 at 3:40 PM Rasmus Villemoes
<linux@xxxxxxxxxxxxxxxxxx> wrote:
> On 24/09/2020 01.29, Kees Cook wrote:
> > rfc: https://lore.kernel.org/lkml/20200616074934.1600036-1-keescook@xxxxxxxxxxxx/
> > alternative: https://lore.kernel.org/containers/cover.1600661418.git.yifeifz2@xxxxxxxxxxxx/
> > v1:
> > - rebase to for-next/seccomp
> > - finish X86_X32 support for both pinning and bitmaps
> > - replace TLB magic with Jann's emulator
> > - add JSET insn
> >
> > TODO:
> > - add ALU|AND insn
> > - significantly more testing
> >
> > Hi,
> >
> > This is a refresh of my earlier constant action bitmap series. It looks
> > like the RFC was missed on the container list, so I've CCed it now. :)
> > I'd like to work from this series, as it handles the multi-architecture
> > stuff.
>
> So, I agree with Jann's point that the only thing that matters is that
> always-allowed syscalls are indeed allowed fast.
>
> But one thing I'm wondering about and I haven't seen addressed anywhere:
> Why build the bitmap on the kernel side (with all the complexity of
> having to emulate the filter for all syscalls)? Why can't userspace just
> hand the kernel "here's a new filter: the syscalls in this bitmap are
> always allowed noquestionsasked, for the rest, run this bpf". Sure, that
> might require a new syscall or extending seccomp(2) somewhat, but isn't
> that a _lot_ simpler? It would probably also mean that the bpf we do get
> handed is a lot smaller. Userspace might need to pass a couple of
> bitmaps, one for each relevant arch, but you get the overall idea.

It's not really a lot of logic though; and there are a bunch of
different things in userspace that talk to the seccomp() syscall that
would have to be updated if we made this part of the UAPI. libseccomp,
Chrome, Android, OpenSSH, bubblewrap, ... - overall, if we can make
the existing interface faster, it'll be less effort, and there will be
less code duplication (because otherwise every user of seccomp will
have to implement the same thing in userspace).

Doing this internally with the old UAPI also means that we're not
creating any additional commitments in terms of UAPI - if we come up
with something better in the future, we can just rip this stuff out.
If we created a new UAPI, we'd have to stay, in some form, compatible
with it forever.

> I'm also a bit worried about the performance of doing that emulation;
> that's constant extra overhead for, say, launching a docker container.
>
> Regardless of how the kernel's bitmap gets created, something like
>
> + if (nr < NR_syscalls) {
> + if (test_bit(nr, bitmaps->allow)) {
> + *filter_ret = SECCOMP_RET_ALLOW;
> + return true;
> + }
>
> probably wants some nospec protection somewhere to avoid the irony of
> seccomp() being used actively by bad guys.

Good point...