Re: [kernel-hardening] Re: [PATCH v9 1/4] syscalls: Verify address limit before returning to user-mode

From: Al Viro
Date: Wed May 10 2017 - 04:09:40 EST


On Wed, May 10, 2017 at 09:37:04AM +0200, Arnd Bergmann wrote:

> > How about trying to remove all of them? If we could actually get rid
> > of all of them, we could drop the arch support, and we'd get faster,
> > simpler, shorter uaccess code throughout the kernel.

BTW, not all get_user() under KERNEL_DS are plain loads. There is an
exception - probe_kernel_read().

> > The ones in kernel/compat.c are generally garbage. They should be
> > using compat_alloc_user_space(). Ditto for kernel/power/user.c.
>
> compat_alloc_user_space() has some problems too, it adds
> complexity to a rarely-tested code path and can add some noticeable
> overhead in cases where user space access is slow because of
> extra checks.
>
> It's clearly better than set_fs(), but the way I prefer to convert the
> code is to avoid both and instead move compat handlers next to
> the native code, and splitting out the common code between native
> and compat mode into a helper that takes a regular kernel pointer.
>
> I think that's what both Al has done in the past on compat_ioctl()
> and select() and what Christoph does in his latest series, but
> it seems worth pointing out for others that decide to help out here.

Folks, reducing the amount of places where we play with set_fs() is certainly
a good thing. Getting rid of them completely is something entirely different;
I have tried to plot out patch series in this direction many times during the
last 5 years or so, but it's not going to be easy. Tomorrow I can start
posting my notes in that direction (and there are tons of those, unfortunately
mixed with git grep results, highly unprintable personal comments, etc.);
just let me grab some sleep first...

BTW, slow userland access is not just due to extra checks; access_ok(),
in particular, is pretty much noise. The real PITA comes from the things
like STAC/CLAC on recent x86. Or hardware overhead of cross-address-space
block copy insn (e.g. on s390, where it's optimized for multi-cacheline
blocks). Or things like uml, where it's a matter of walking the page
tables for each sodding __get_user(). It's not always just a matter of
address space limit...