Re: [PATCH v7 3/6] seccomp: add a way to get a listener fd from ptrace

From: Christian Brauner
Date: Wed Oct 10 2018 - 13:15:16 EST


On Wed, Oct 10, 2018 at 09:54:58AM -0700, Tycho Andersen wrote:
> On Wed, Oct 10, 2018 at 05:39:57PM +0200, Christian Brauner wrote:
> > On Wed, Oct 10, 2018 at 05:33:43PM +0200, Jann Horn wrote:
> > > On Wed, Oct 10, 2018 at 5:32 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> > > > On Tue, Oct 9, 2018 at 9:36 AM Jann Horn <jannh@xxxxxxxxxx> wrote:
> > > > > +cc selinux people explicitly, since they probably have opinions on this
> > > >
> > > > I just spent about twenty minutes working my way through this thread,
> > > > and digging through the containers archive trying to get a good
> > > > understanding of what you guys are trying to do, and I'm not quite
> > > > sure I understand it all. However, from what I have seen, this
> > > > approach looks very ptrace-y to me (I imagine to others as well based
> > > > on the comments) and because of this I think ensuring the usual ptrace
> > > > access controls are evaluated, including the ptrace LSM hooks, is the
> > > > right thing to do.
> > >
> > > Basically the problem is that this new ptrace() API does something
> > > that doesn't just influence the target task, but also every other task
> > > that has the same seccomp filter. So the classic ptrace check doesn't
> > > work here.
> >
> > Just to throw this into the mix: then maybe ptrace() isn't the right
> > interface and we should just go with the native seccomp() approach for
> > now.
>
> Please no :).
>
> I don't buy your arguments that 3-syscalls vs. one is better. If I'm
> doing this setup with a new container, I have to do
> clone(CLONE_FILES), do this seccomp thing, so that my parent can pick
> it up again, then do another clone without CLONE_FILES, because in the
> general case I don't want to share my fd table with the container,
> wait on the middle task for errors, etc. So we're still doing a bunch
> of setup, and it feels more awkward than ptrace, with at least as many
> syscalls, and it only works for your children.

You're talking about the case where you already have shot yourself in
the foot by blocking basically all other sensible ways of getting the fd
out.

Also, this was meant to show that parts of your initial justification
for implementing the ptrace() way of getting an fd doesn't really stand.
And it doesn't really. Even with ptrace() you can get into situations
where you're not able to get an fd. (see prior threads)

>
> I don't mind leaving capable(CAP_SYS_ADMIN) for the ptrace() part,

Again, (prior threads) ptrace() or no ptrace() is something I do not
particularly care about as long as we have the
non-capable(CAP_SYS_ADMIN) seccomp() way of getting an fd out.

> though. So if that's ok, then I think we can agree :)
>
> Tycho