Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches

From: Jann Horn
Date: Fri Jan 31 2020 - 07:04:09 EST


On Thu, Jan 30, 2020 at 8:23 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> On Wed, Jan 29, 2020 at 06:19:56PM +0100, Christian Borntraeger wrote:
> > On 29.01.20 18:09, Christoph Hellwig wrote:
> > > On Wed, Jan 29, 2020 at 06:07:14PM +0100, Christian Borntraeger wrote:
> > >>> DMA can be done to NORMAL memory as well.
> > >>
> > >> Exactly.
> > >> I think iucv uses GFP_DMA because z/VM needs those buffers to reside below 2GB (which is ZONA_DMA for s390).
> > >
> > > The normal way to allocate memory with addressing limits would be to
> > > use dma_alloc_coherent and friends. Any chance to switch iucv over to
> > > that? Or is there no device associated with it?
> >
> > There is not necessarily a device for that. It is a hypervisor interface (an
> > instruction that is interpreted by z/VM). We do have the netiucv driver that
> > creates a virtual nic, but there is also AF_IUCV which works without a device.
> >
> > But back to the original question: If we mark kmalloc caches as usercopy caches,
> > we should do the same for DMA kmalloc caches. As outlined by Christoph, this has
> > nothing to do with device DMA.
>
> Hm, looks like it's allocated from the low 16MB. Seems like poor naming!

The physical address limit of the DMA zone depends on the architecture
(and the kernel version); e.g. on Linux 4.4 on arm64 (which is used on
the Pixel 2), the DMA zone goes up to 4GiB. Later, arm64 started using
the DMA32 zone for that instead (as was already the case on e.g.
X86-64); but recently (commit 1a8e1cef7603), arm64 started using the
DMA zone again, but now for up to 1GiB. (AFAICS the DMA32 zone can't
be used with kmalloc at all, that only works with the DMA zone.)

> :) There seems to be a LOT of stuff using GFP_DMA, and it seems unlikely
> those are all expecting low addresses?

I think there's a bunch of (especially really old) hardware where the
hardware can only talk to low physical addresses, e.g. stuff that uses
the ISA bus.

However, there aren't *that* many users of GFP_DMA that actually cause
kmalloc allocations with GFP_DMA - many of the users of GFP_DMA
actually just pass that flag to dma_alloc_coherent()/dma_pool_alloc(),
where it is filtered away and the allocation ultimately doesn't go
through the slab allocator AFAICS, or they pass it directly to the
page allocator, where it has no effect on the usercopy stuff. Looking
on my workstation, there are zero objects allocated in dma-kmalloc-*
slabs:

/sys/kernel/slab# for name in dma-kmalloc-*; do echo "$name: $(cat
$name/objects)"; done
dma-kmalloc-128: 0
dma-kmalloc-16: 0
dma-kmalloc-192: 0
dma-kmalloc-1k: 0
dma-kmalloc-256: 0
dma-kmalloc-2k: 0
dma-kmalloc-32: 0
dma-kmalloc-4k: 0
dma-kmalloc-512: 0
dma-kmalloc-64: 0
dma-kmalloc-8: 0
dma-kmalloc-8k: 0
dma-kmalloc-96: 0

On a Pixel 2, there are a whole five objects allocated across the DMA
zone kmalloc caches:

walleye:/sys/kernel/slab # for name in dma-kmalloc-*; do echo "$name:
$(cat $name/objects)"; done
dma-kmalloc-1024: 0
dma-kmalloc-128: 0
dma-kmalloc-2048: 2
dma-kmalloc-256: 0
dma-kmalloc-4096: 3
dma-kmalloc-512: 0
dma-kmalloc-8192: 0

> Since this has only been a problem on s390, should just s390 gain the
> weakening of the usercopy restriction? Something like:
>
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 1907cb2903c7..c5bbc141f20b 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1303,7 +1303,9 @@ void __init create_kmalloc_caches(slab_flags_t flags)
> kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache(
> kmalloc_info[i].name[KMALLOC_DMA],
> kmalloc_info[i].size,
> - SLAB_CACHE_DMA | flags, 0, 0);
> + SLAB_CACHE_DMA | flags, 0,
> + IS_ENABLED(CONFIG_S390) ?
> + kmalloc_info[i].size : 0);
> }
> }
> #endif

I think dma-kmalloc slabs should be handled the same way as normal
kmalloc slabs. When a dma-kmalloc allocation is freshly created, it is
just normal kernel memory - even if it might later be used for DMA -,
and it should be perfectly fine to copy_from_user() into such
allocations at that point, and to copy_to_user() out of them at the
end. If you look at the places where such allocations are created, you
can see things like kmemdup(), memcpy() and so on - all normal
operations that shouldn't conceptually be different from usercopy in
any relevant way.