Re: For review: seccomp_user_notif(2) manual page [v2]

From: Jann Horn
Date: Wed Oct 28 2020 - 21:49:14 EST


On Mon, Oct 26, 2020 at 10:55 AM Michael Kerrisk (man-pages)
<mtk.manpages@xxxxxxxxx> wrote:
> static bool
> getTargetPathname(struct seccomp_notif *req, int notifyFd,
> char *path, size_t len)
> {
> char procMemPath[PATH_MAX];
>
> snprintf(procMemPath, sizeof(procMemPath), "/proc/%d/mem", req->pid);
>
> int procMemFd = open(procMemPath, O_RDONLY);
> if (procMemFd == -1)
> errExit("\tS: open");
>
> /* Check that the process whose info we are accessing is still alive.
> If the SECCOMP_IOCTL_NOTIF_ID_VALID operation (performed
> in checkNotificationIdIsValid()) succeeds, we know that the
> /proc/PID/mem file descriptor that we opened corresponds to the
> process for which we received a notification. If that process
> subsequently terminates, then read() on that file descriptor
> will return 0 (EOF). */
>
> checkNotificationIdIsValid(notifyFd, req->id);
>
> /* Read bytes at the location containing the pathname argument
> (i.e., the first argument) of the mkdir(2) call */
>
> ssize_t nread = pread(procMemFd, path, len, req->data.args[0]);
> if (nread == -1)
> errExit("pread");

As discussed at
<https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@xxxxxxxxxxxxxx>,
we need to re-check checkNotificationIdIsValid() after reading remote
memory but before using the read value in any way. Otherwise, the
syscall could in the meantime get interrupted by a signal handler, the
signal handler could return, and then the function that performed the
syscall could free() allocations or return (thereby freeing buffers on
the stack).

In essence, this pread() is (unavoidably) a potential use-after-free
read; and to make that not have any security impact, we need to check
whether UAF read occurred before using the read value. This should
probably be called out elsewhere in the manpage, too...

Now, of course, **reading** is the easy case. The difficult case is if
we have to **write** to the remote process... because then we can't
play games like that. If we write data to a freed pointer, we're
screwed, that's it. (And for somewhat unrelated bonus fun, consider
that /proc/$pid/mem is originally intended for process debugging,
including installing breakpoints, and will therefore happily write
over "readonly" private mappings, such as typical mappings of
executable code.)

So, uuuuh... I guess if anyone wants to actually write memory back to
the target process, we'd better come up with some dedicated API for
that, using an ioctl on the seccomp fd that magically freezes the
target process inside the syscall while writing to its memory, or
something like that? And until then, the manpage should have a big fat
warning that writing to the target's memory is simply not possible
(safely).

> if (nread == 0) {
> fprintf(stderr, "\tS: pread() of /proc/PID/mem "
> "returned 0 (EOF)\n");
> exit(EXIT_FAILURE);
> }