Re: [PATCH] kernel: introduce prctl(PR_LOG_UACCESS)

From: David Hildenbrand
Date: Wed Sep 22 2021 - 13:47:26 EST


On 22.09.21 08:18, Peter Collingbourne wrote:
This patch introduces a kernel feature known as uaccess logging.
With uaccess logging, the userspace program passes the address and size
of a so-called uaccess buffer to the kernel via a prctl(). The prctl()
is a request for the kernel to log any uaccesses made during the next
syscall to the uaccess buffer. When the next syscall returns, the address
one past the end of the logged uaccess buffer entries is written to the
location specified by the third argument to the prctl(). In this way,
the userspace program may enumerate the uaccesses logged to the access
buffer to determine which accesses occurred.

Uaccess logging has several use cases focused around bug detection
tools:

1) Userspace memory safety tools such as ASan, MSan, HWASan and tools
making use of the ARM Memory Tagging Extension (MTE) need to monitor
all memory accesses in a program so that they can detect memory
errors. For accesses made purely in userspace, this is achieved
via compiler instrumentation, or for MTE, via direct hardware
support. However, accesses made by the kernel on behalf of the
user program via syscalls (i.e. uaccesses) are invisible to these
tools. With MTE there is some level of error detection possible in
the kernel (in synchronous mode, bad accesses generally result in
returning -EFAULT from the syscall), but by the time we get back to
userspace we've lost the information about the address and size of the
failed access, which makes it harder to produce a useful error report.

With the current versions of the sanitizers, we address this by
interposing the libc syscall stubs with a wrapper that checks the
memory based on what we believe the uaccesses will be. However, this
creates a maintenance burden: each syscall must be annotated with
its uaccesses in order to be recognized by the sanitizer, and these
annotations must be continuously updated as the kernel changes. This
is especially burdensome for syscalls such as ioctl(2) which have a
large surface area of possible uaccesses.

2) Verifying the validity of kernel accesses. This can be achieved in
conjunction with the userspace memory safety tools mentioned in (1).
Even a sanitizer whose syscall wrappers have complete knowledge of
the kernel's intended API may vary from the kernel's actual uaccesses
due to kernel bugs. A sanitizer with knowledge of the kernel's actual
uaccesses may produce more accurate error reports that reveal such
bugs.

An example of such a bug, which was found by an earlier version of this
patch together with a prototype client of the API in HWASan, was fixed
by commit d0efb16294d1 ("net: don't unconditionally copy_from_user
a struct ifreq for socket ioctls"). Although this bug turned out to
relatively harmless, it was a bug nonetheless and it's always possible
that more serious bugs of this sort may be introduced in the future.

3) Kernel fuzzing. We may use the list of reported kernel accesses to
guide a kernel fuzzing tool such as syzkaller (so that it knows which
parts of user memory to fuzz), as an alternative to providing the tool
with a list of syscalls and their uaccesses (which again thanks to
(2) may not be accurate).

All signals except SIGKILL and SIGSTOP are masked for the interval
between the prctl() and the next syscall in order to prevent handlers
for intervening asynchronous signals from issuing syscalls that may
cause uaccesses from the wrong syscall to be logged.

Stupid question: can this be exploited from user space to effectively disable SIGKILL for a long time ... and do we care?

Like, the application allocates a bunch of memory, issues the prctl() and spins in user space. What would happen if the OOM killer selects this task as a target and does a do_send_sig_info(SIGKILL, SEND_SIG_PRIV, ...) ?

--
Thanks,

David / dhildenb