Re: For review: seccomp_user_notif(2) manual page [v2]

From: Michael Kerrisk (man-pages)
Date: Sat Oct 31 2020 - 04:55:33 EST

Next message: Chengsong Ke: "[PATCH v2] ubifs: Fix the printing type of c->big_lpt"
Previous message: Christoph Hellwig: "Re: [PATCH 02/18] block: open code kobj_map into in block/genhd.c"
In reply to: Jann Horn: "Re: For review: seccomp_user_notif(2) manual page [v2]"
Next in thread: Sargun Dhillon: "Re: For review: seccomp_user_notif(2) manual page [v2]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 10/30/20 8:20 PM, Jann Horn wrote:
> On Thu, Oct 29, 2020 at 8:14 PM Michael Kerrisk (man-pages)
> <mtk.manpages@xxxxxxxxx> wrote:
>> On 10/29/20 2:42 AM, Jann Horn wrote:
>>> As discussed at
>>> <https://lore.kernel.org/r/CAG48ez0m4Y24ZBZCh+Tf4ORMm9_q4n7VOzpGjwGF7_Fe8EQH=Q@xxxxxxxxxxxxxx>,
>>> we need to re-check checkNotificationIdIsValid() after reading remote
>>> memory but before using the read value in any way. Otherwise, the
>>> syscall could in the meantime get interrupted by a signal handler, the
>>> signal handler could return, and then the function that performed the
>>> syscall could free() allocations or return (thereby freeing buffers on
>>> the stack).
>>>
>>> In essence, this pread() is (unavoidably) a potential use-after-free
>>> read; and to make that not have any security impact, we need to check
>>> whether UAF read occurred before using the read value. This should
>>> probably be called out elsewhere in the manpage, too...
>>>
>>> Now, of course, **reading** is the easy case. The difficult case is if
>>> we have to **write** to the remote process... because then we can't
>>> play games like that. If we write data to a freed pointer, we're
>>> screwed, that's it. (And for somewhat unrelated bonus fun, consider
>>> that /proc/$pid/mem is originally intended for process debugging,
>>> including installing breakpoints, and will therefore happily write
>>> over "readonly" private mappings, such as typical mappings of
>>> executable code.)
>>>
>>> So, uuuuh... I guess if anyone wants to actually write memory back to
>>> the target process, we'd better come up with some dedicated API for
>>> that, using an ioctl on the seccomp fd that magically freezes the
>>> target process inside the syscall while writing to its memory, or
>>> something like that? And until then, the manpage should have a big fat
>>> warning that writing to the target's memory is simply not possible
>>> (safely).
>>
>> Thank you for your very clear explanation! It turned out to be
>> trivially easy to demonstrate this issue with a slightly modified
>> version of my program.
>>
>> As well as the change to the code example that I already mentioned
>> my reply of a few hours ago, I've added the following text to the
>> page:
>>
>> Caveats regarding the use of /proc/[tid]/mem
>> The discussion above noted the need to use the
>> SECCOMP_IOCTL_NOTIF_ID_VALID ioctl(2) when opening the
>> /proc/[tid]/mem file of the target to avoid the possibility of
>> accessing the memory of the wrong process in the event that the
>> target terminates and its ID is recycled by another (unrelated)
>> thread. However, the use of this ioctl(2) operation is also
>> necessary in other situations, as explained in the following
>> pargraphs.
>
> (nit: paragraphs)

I spotted that one also already. But thanks for reading carefully!

>> Consider the following scenario, where the supervisor tries to
>> read the pathname argument of a target's blocked mount(2) system
>> call:
> [...]
>> Seem okay?
>
> Yeah, sounds good.
>
>> By the way, is there any analogous kind of issue concerning
>> pidfd_getfd()? I'm thinking not, but I wonder if I've missed
>> something.
>
> When it is used by a seccomp supervisor, you mean? I think basically
> the same thing applies - when resource identifiers (such as memory
> addresses or file descriptors) are passed to a syscall, it generally
> has to be assumed that those identifiers may become invalid and be
> reused as soon as the syscall has returned.

I probably needed to be more explicit. Would the following (i.e., a
single cookie check) not be sufficient to handle the above scenario.
Here, the target is making a syscall a system call that employs the
file descriptor 'tfd':

T: makes syscall that triggers notification
S: Get notification
S: pidfd = pidfd_open(T, 0);
S: sfd = pifd_getfd(pidfd, tfd, 0)
S: check that the cookie is still valid
S: do operation with sfd [*]

By contrast, I can see that we might want to do multiple cookie
checks in the /proc/PID/mem case, since the supervisor might do
multiple reads.

Or, do you mean: there really needs to be another cookie check after
the point [*], since, if the the target's syscall was interrupted
and 'tfd' was closed/resused, then the supervisor would be operating
with a file descriptor that refers to an open file description
(a "struct file") that is no longer meaningful in the target?
(Thinking about it, I think this probably is what you mean, but
I want to confirm.)

Thanks,

Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Next message: Chengsong Ke: "[PATCH v2] ubifs: Fix the printing type of c->big_lpt"
Previous message: Christoph Hellwig: "Re: [PATCH 02/18] block: open code kobj_map into in block/genhd.c"
In reply to: Jann Horn: "Re: For review: seccomp_user_notif(2) manual page [v2]"
Next in thread: Sargun Dhillon: "Re: For review: seccomp_user_notif(2) manual page [v2]"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]