Re: [PATCH v2 4/4] samples: Add example of using PTRACE_GETFD in conjunction with user trap

From: Christian Brauner
Date: Tue Dec 10 2019 - 11:38:54 EST


On Tue, Dec 10, 2019 at 08:07:45AM -0800, Sargun Dhillon wrote:
> On Tue, Dec 10, 2019 at 3:10 AM Christian Brauner
> <christian.brauner@xxxxxxxxxx> wrote:
> >
> > [I'm expanding the Cc to a few Firefox and glibc people since we've been
> > been talking about replacing SECCOMP_RET_TRAP with
> > SECCOMP_RET_USER_NOTIF for a bit now because the useage of
> > SECCOMP_RET_TRAP in the broker blocks desirable core glibc changes.
> > Even if just for their lurking pleasure. :)]
> >
> > On Mon, Dec 09, 2019 at 09:46:35PM +0100, Oleg Nesterov wrote:
> > > On 12/09, Christian Brauner wrote
> > >
> > > I agree, and I won't really argue...
> > >
> > > but the changelog in 2/4 says
> > >
> > > The requirement that the tracer has attached to the tracee prior to the
> > > capture of the file descriptor may be lifted at a later point.
> > >
> > > so may be we should do this right now?
> >
> > I think so, yes. This doesn't strike me as premature optimization but
> > rather as a core design questions.
> >
> > >
> > > plus this part
> > >
> > > @@ -1265,7 +1295,8 @@ SYSCALL_DEFINE4(ptrace, long, request, long, pid, unsigned long, addr,
> > > }
> > >
> > > ret = ptrace_check_attach(child, request == PTRACE_KILL ||
> > > - request == PTRACE_INTERRUPT);
> > > + request == PTRACE_INTERRUPT ||
> > > + request == PTRACE_GETFD);
> > >
> > > actually means "we do not need ptrace, but we do not know where else we
> > > can add this fd_install(get_task_file()).
> >
> > Right, I totally get your point and I'm not a fan of this being in
> > ptrace() either.
> >
> > The way I see is is that the main use-case for this feature is the
> > seccomp notifier and I can see this being useful. So the right place to
> > plumb this into might just be seccomp and specifically on to of the
> > notifier.
> > If we don't care about getting and setting fds at random points of
> > execution it might make sense to add new options to the notify ioctl():
> >
> > #define SECCOMP_IOCTL_NOTIF_GET_FD SECCOMP_IOWR(3, <sensible struct>)
> > #define SECCOMP_IOCTL_NOTIF_SET_FD SECCOMP_IOWR(4, <sensible struct>)
> >
> > which would let you get and set fds while the supervisee is blocked.
> >
> > Christian
> Doesn't SECCOMP_IOCTL_NOTIF_GET_FD have some ambiguity to it?

As Tycho mentioned, this is why we have a the tid of the calling task
but we also have a cookie per request.
The cookie is useful so that you can do
- receive request <chocolate> cookie
- open(/proc/<pid>{/mem})
- verify <chocolate> cookie still exists
- <chocolate> cookie still exists -> file descriptor refers to correct
task
- <chocolate> cookie gone -> task has been recycled

> Specifically, because
> multiple processes can have the same notifier attached to them? If we
> choose to go down the
> route of introducing an ioctl (which I'm not at all opposed to), I
> would rather do it on pidfd. We
> can then plumb seccomp notifier to send pidfd instead of raw pid. In
> the mean time, folks
> can just open up /proc/${PID}, and do the check cookie dance.
>
> Christian,
> As the maintainer of pidfd, what do you think?

Let me quote what I wrote to the Mozilla folks today. :)

"(One thing that always strikes me is that if my pidfd patches would've
been ready back when we did the seccomp notifier we could've added a
pidfd argument to the seccomp notifier kernel struct and if a flag is
set given back a pidfd alongside the notifier fd. This way none of this
revalidting the id stuff would've been necessary and you could also
safely translate from a pidfd into a /proc/<pid> directory to e.g. open
/proc/<pid>/mem. Anyway, that's not out of scope. One could still
write a patch for that to add a pidfd argument under a new flag to the
kernel struct. Should be rather trivial.)"

So yeah, it crossed my mind. ;)

I really would like to have this placed under a flag though...
I very much dislike the idea of receiving any kind of fd
- _especially a pidfd_ - implicitly.
So ideally this would be a flag to the receive ioctl(). Kees just got my
SECCOMP_USER_NOTIF_FLAG_CONTINUE patchset merged for v5.5 which adds the

#define SECCOMP_USER_NOTIF_FLAG_CONTINUE (1UL << 0)

flag which when set in the send case (i.e. supervisor -> kernel) will
cause the syscall to be executed.

When we add a new flag to get a pidfd it might make sense to rename the
CONTINUE flag in master before v5.5 is out to

#define SECCOMP_USER_NOTIF_SEND_FLAG_CONTINUE (1UL << 0)

to indicate that it's only valid for the SEND ioctl().

Then we add

#define SECCOMP_USER_NOTIF_RECV_FLAG_PIDFD (1UL << 0)

for v5.6. This way send and receive flags are named differently for
clarity. (I don't care about the name being long. Other people might
though _shrug_.)

Christian