Re: [PATCH] x86: uaccess: fix regression in unsafe_get_user

From: Jann Horn
Date: Mon Feb 18 2019 - 16:13:42 EST


On Mon, Feb 18, 2019 at 8:15 PM Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Mon, Feb 18, 2019 at 5:04 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > > Another would be to have the buffer passed to flush_buffer() (i.e.
> > > the callback of decompress_fn) allocated with 4 bytes of padding
> > > past the part where the unpacked piece of data is placed for the
> > > callback to find. As in,
> > >
> > > diff --git a/lib/decompress_inflate.c b/lib/decompress_inflate.c
> > > index 63b4b7eee138..ca3f7ecc9b35 100644
> > > --- a/lib/decompress_inflate.c
> > > +++ b/lib/decompress_inflate.c
> > > @@ -48,7 +48,7 @@ STATIC int INIT __gunzip(unsigned char *buf, long len,
> > > rc = -1;
> > > if (flush) {
> > > out_len = 0x8000; /* 32 K */
> > > - out_buf = malloc(out_len);
> > > + out_buf = malloc(out_len + 4);
> >
> > +8 actually.
> >
> > > } else {
> > > if (!out_len)
> > > out_len = ((size_t)~0) - (size_t)out_buf; /* no limit */
> > >
> > > for gunzip/decompress and similar ones for bzip2, etc. The contents
> > > layout doesn't have anything to do with that...
> >
> > Right. That works nicely.
> >
>
> This seems like it's just papering over the underlying problem: with
> Jann's new checks in place, strncpy_from_user() is simply buggy. Does
> the patch below look decent? It's only compile-tested, but it's
> conceptually straightforward.

Nit: The "This could be optimized" comment has unbalanced parentheses around it.
Nit: Maybe switch the order of "max" and "IS_UNALIGNED(src+res)" in
the loop for unaligned prefix, or tell the compiler that "max" is
likely to be non-zero? That might be a tiny bit more efficient in the
aligned case? But I don't know much about making code fast, so feel
free to ignore this.

I think there is still similar code in some other places in arch/x86;
back when I did the conversion to _ASM_EXTABLE_UA, I stumbled over a
few similar-looking things. In particular:

load_unaligned_zeropad() (used in some VFS code that looks pretty
performance-critical) still uses _ASM_EXTABLE for reading from kernel
memory; it has a comment saying:
/*
* Load an unaligned word from kernel space.
*
* In the (very unlikely) case of the word being a page-crosser
* and the next page not being mapped, take the exception and
* return zeroes in the non-existing part.
*/

csum_partial_copy_generic() uses PREFETCHT0 together with the
following macro; I think this can be used on both kernel and user
addresses?
/*
* No _ASM_EXTABLE_UA; this is used for intentional prefetch on a
* potentially unmapped kernel address.
*/
.macro ignore L=.Lignore
30:
_ASM_EXTABLE(30b, \L)
.endm
(That comment says "kernel address", but I wrote that, and I think
what I really meant is "kernel or userspace address".)

> I was hoping I could get rid of the
> check-maximum-address stuff, but it's needed for architectures where
> the user range is adjacent to the kernel range (i.e. not x86_64).

> Jann, I'm still unhappy that this code will write up to sizeof(long)-1
> user-controlled garbage bytes in-bounds past the null-terminator in
> the kernel buffer. Do you think that's worth changing, too? I don't
> think it's a bug per se, but it seems like a nifty little wart for an
> attacker to try to abuse.

Hm. I guess it might be interesting if some code path first memsets a
kernel buffer to all-zeroes, then uses strncpy_from_user() to copy
into it, and then exposes the entire buffer to a different user task
(or, if strncpy_from_user() was called on kernel memory, to any user
task).

And if the source is kernel memory (might happen e.g. via splice,
there are VFS write handlers that use strncpy_from_user()), it could
potentially be used to make an uninitialized memory read bug
(constrained to a specific field in some slab object, or something
like that) more powerful, by shoving out-of-bounds kernel data around
such that it is aligned properly for the leak.

So, yeah, I think if it's not too much trouble, changing that would be nice.