Re: [PATCH RFC] bpf: Add support for reading user pointers

From: Will Deacon
Date: Tue May 07 2019 - 05:53:44 EST


On Mon, May 06, 2019 at 10:57:37PM +0100, Qais Yousef wrote:
> On 05/06/19 16:58, Joel Fernandes wrote:
> > > If you're trying to dereference a pointer to userspace using
> > > probe_kernel_read(), that clearly isn't going to work.
> >
> > Ok. Thanks for confirming as well. The existing code has this bug and these
> > patches fix it.
>
> 5.1-rc7 and 4.9.173 stable both managed to read the path in do_sys_open() on my
> Juno-r2 board using the defconfig in the tree.

That's not surprising: Juno-r2 only features v8.0 CPUs, so doesn't have PAN
or UAO capabilities. The SoC Joel is talking about is 8.2, so has both of
those.

Here's some background which might help...

PAN (Privileged Access Never) prevents the kernel from inadvertently
accessing userspace and will cause a page (permission) fault if such an
access is made outside of the standard uaccess routines. This means that
in those routines (e.g. get_user()) we have to toggle the PAN state in the
same way that x86 toggles SMAP. This can be expensive and was part of the
motivation for the adoption of things like unsafe_get_user() on x86.

On arm64, we have a set of so-called "translated" memory access instructions
which can be used to perform unprivileged accesses to userspace from within
the kernel even when PAN is enabled. Using these special instructions (e.g.
LDTR) in our uaccess routines can therefore remove the need to toggle PAN.
Sounds great, right? Well, that all falls apart when the uaccess routines
are used on kernel addresses as in probe_kernel_read() because they will
fault when trying to dereference a kernel pointer.

The answer is UAO (User Access Override). When UAO is set, the translated
memory access instructions behave the same as non-translated accesses.
Therefore we can toggle UAO in set_fs() so that it is set when we're using
KERNEL_DS and cleared when we're using USER_DS.

The side-effect of this is that when KERNEL_DS is set on a system that
implements both PAN and UAO, passing a user pointer to our uaccess routines
will return -EFAULT. In other words, set_fs() can be thought of as a
selector between the kernel and user address spaces, which are distinct.

Joel -- does disabling UAO in your .config "fix" the issue? If so, feel
free to use some of the above text in a commit message if it helps to
justify your change.

Cheers,

Will