Re: [PATCH] seccomp: add ptrace commands for suspend/resume

From: Oleg Nesterov
Date: Wed Jun 03 2015 - 12:42:26 EST


On 06/03, Tycho Andersen wrote:
>
> On Tue, Jun 02, 2015 at 08:28:29PM +0200, Oleg Nesterov wrote:
> > On 06/01, Tycho Andersen wrote:
> > >
> > > --- a/include/linux/seccomp.h
> > > +++ b/include/linux/seccomp.h
> > > @@ -25,6 +25,9 @@ struct seccomp_filter;
> > > struct seccomp {
> > > int mode;
> > > struct seccomp_filter *filter;
> > > +#ifdef CONFIG_CHECKPOINT_RESTORE
> > > + bool suspended;
> > > +#endif
> >
> > Then afaics you need to change copy_seccomp() to clear ->suspended.
> > At least if the child is not traced.
>
> Yes, thank you.

And if we really need to play with TIF_NOTSC, then copy_seccomp() should
set it too if SUSPEND has cleared in parent's flags.

> > But why do we bother to play with TIF_NOTSC, could you explain?
>
> The procedure for restoring is to call seccomp suspend, restore the
> seccomp filters (and potentially other stuff), and then resume them at
> the end. If the other stuff happens to use RDTSC, the process gets
> killed because TIF_NOTSC has been set.

This is clear, just I thought that CRIU doesn't use rdtsc on behalf of
the traced task...

> We can work around this in criu by doing the seccomp restore as the
> very last thing before the final sigreturn,

Not sure I understand... You need to suspend at "dump" time too afaics,
otherwise, say, syscall_seized() can fail because this syscall is nacked
by seccomp?

> but that seems like the
> seccomp suspend API is incomplete, IMO. However, since both you and
> Andy complained, perhaps I should remove it :)

Well, this is up to you ;)

But. Note that a process can also disable TSC via PR_SET_TSC. So if
dump or restore can't work without enabling TSC you probably want to
handle this case too.

And this makes me think that this needs a separate interface. I dunno.

> > And I am not sure I understand why do we need the additional security
> > check, but I leave this to you and Andy.
>
> Yes, it is required to prevent the case Pavel mentions (although there
> are other ways to get around seccomp with ptrace, the goal here is to
> not depend on that behavior so that when it is eventually fixed this
> doesn't break).

I still do not think it makes any sense. again, if you can trace this
process then you can disable the filtering anyway. Lets assume that
seccomp_run_filters() acks, say, sys_getpid(). Or fork() in the case
Pavel mentioned, this doesn't matter. Now you can force the tracee to
call this syscall, then change syscall_nr.

But as I said I won't argue, please forget.

> Ok, this has changed slightly with the "always resume on
> detach/unlink" change Pavel suggested,

To remind, it is not easy to restore TIF_NOTSC if the tracer dies.

PTRACE_DETACH can do this because the tracee can't be woken up. But
personally I'd prefer the expicit RESUME request rather than "rely
on PTRACE_DETACH".

If we avoid the TSC games, then, again, please consider
PTRACE_O_SECCOMP_DISABLE. This will solve the problems with
fork/detach/tracer-death automatically.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/