Re: For review: seccomp_user_notif(2) manual page [v2]
From: Michael Kerrisk (man-pages)
Date: Thu Oct 29 2020 - 10:23:25 EST
Hello Jann,
On 10/29/20 2:42 AM, Jann Horn wrote:
> On Mon, Oct 26, 2020 at 10:55 AM Michael Kerrisk (man-pages)
> <mtk.manpages@xxxxxxxxx> wrote:
>> static bool
>> getTargetPathname(struct seccomp_notif *req, int notifyFd,
>> char *path, size_t len)
>> {
>> char procMemPath[PATH_MAX];
>>
>> snprintf(procMemPath, sizeof(procMemPath), "/proc/%d/mem", req->pid);
>>
>> int procMemFd = open(procMemPath, O_RDONLY);
>> if (procMemFd == -1)
>> errExit("\tS: open");
>>
>> /* Check that the process whose info we are accessing is still alive.
>> If the SECCOMP_IOCTL_NOTIF_ID_VALID operation (performed
>> in checkNotificationIdIsValid()) succeeds, we know that the
>> /proc/PID/mem file descriptor that we opened corresponds to the
>> process for which we received a notification. If that process
>> subsequently terminates, then read() on that file descriptor
>> will return 0 (EOF). */
>>
>> checkNotificationIdIsValid(notifyFd, req->id);
>>
>> /* Read bytes at the location containing the pathname argument
>> (i.e., the first argument) of the mkdir(2) call */
>>
>> ssize_t nread = pread(procMemFd, path, len, req->data.args[0]);
>> if (nread == -1)
>> errExit("pread");
>
> As discussed at
> <https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@xxxxxxxxxxxxxx>,
> we need to re-check checkNotificationIdIsValid() after reading remote
> memory but before using the read value in any way. Otherwise, the
> syscall could in the meantime get interrupted by a signal handler, the
> signal handler could return, and then the function that performed the
> syscall could free() allocations or return (thereby freeing buffers on
> the stack).
>
> In essence, this pread() is (unavoidably) a potential use-after-free
> read; and to make that not have any security impact, we need to check
> whether UAF read occurred before using the read value. This should
> probably be called out elsewhere in the manpage, too...
Thanks very much for pointing me at this!
So, I want to conform that the fix to the code is as simple as
adding a check following the pread() call, something like:
[[
ssize_t nread = pread(procMemFd, path, len, req->data.args[argNum]);
if (nread == -1)
errExit("Supervisor: pread");
if (nread == 0) {
fprintf(stderr, "\tS: pread() of /proc/PID/mem "
"returned 0 (EOF)\n");
exit(EXIT_FAILURE);
}
if (close(procMemFd) == -1)
errExit("Supervisor: close-/proc/PID/mem");
+ /* Once again check that the notification ID is still valid. The
+ case we are particularly concerned about here is that just
+ before we fetched the pathname, the target's blocked system
+ call was interrupted by a signal handler, and after the handler
+ returned, the target carried on execution (past the interrupted
+ system call). In that case, we have no guarantees about what we
+ are reading, since the target's memory may have been arbitrarily
+ changed by subsequent operations. */
+
+ if (!notificationIdIsValid(notifyFd, req->id, "post-open"))
+ return false;
+
/* We have no guarantees about what was in the memory of the target
process. We therefore treat the buffer returned by pread() as
untrusted input. The buffer should be terminated by a null byte;
if not, then we will trigger an error for the target process. */
if (strnlen(path, nread) < nread)
return true;
]]
> Now, of course, **reading** is the easy case. The difficult case is if
> we have to **write** to the remote process... because then we can't
> play games like that. If we write data to a freed pointer, we're
> screwed, that's it. (And for somewhat unrelated bonus fun, consider
> that /proc/$pid/mem is originally intended for process debugging,
> including installing breakpoints, and will therefore happily write
> over "readonly" private mappings, such as typical mappings of
> executable code.)
>
> So, uuuuh... I guess if anyone wants to actually write memory back to
> the target process, we'd better come up with some dedicated API for
> that, using an ioctl on the seccomp fd that magically freezes the
> target process inside the syscall while writing to its memory, or
> something like that? And until then, the manpage should have a big fat
> warning that writing to the target's memory is simply not possible
> (safely).
>
>> if (nread == 0) {
>> fprintf(stderr, "\tS: pread() of /proc/PID/mem "
>> "returned 0 (EOF)\n");
>> exit(EXIT_FAILURE);
>> }
> .
I'll think over some changes to the text of the manual page.
Cheers,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/