Re: [kernel-hardening] [PATCH 09/38] usercopy: Mark kmalloc caches as usercopy caches

From: Jann Horn
Date: Sat Feb 01 2020 - 14:28:22 EST


[pruned bogus addresses from recipient list]

On Sat, Feb 1, 2020 at 6:56 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> On Fri, Jan 31, 2020 at 01:03:40PM +0100, Jann Horn wrote:
> > I think dma-kmalloc slabs should be handled the same way as normal
> > kmalloc slabs. When a dma-kmalloc allocation is freshly created, it is
> > just normal kernel memory - even if it might later be used for DMA -,
> > and it should be perfectly fine to copy_from_user() into such
> > allocations at that point, and to copy_to_user() out of them at the
> > end. If you look at the places where such allocations are created, you
> > can see things like kmemdup(), memcpy() and so on - all normal
> > operations that shouldn't conceptually be different from usercopy in
> > any relevant way.
>
> I can't find where the address limit for dma-kmalloc is implemented.

dma-kmalloc is a slab that uses GFP_DMA pages.

Things have changed a bit through the kernel versions, but in current
mainline, the zone limit for GFP_DMA is reported from arch code to
generic code via zone_dma_bits, from where it is used to decide which
zones should be used for allocations based on the address limit of a
given device:

kernel/dma/direct.c:
/*
* Most architectures use ZONE_DMA for the first 16 Megabytes, but some use it
* it for entirely different regions. In that case the arch code needs to
* override the variable below for dma-direct to work properly.
*/
unsigned int zone_dma_bits __ro_after_init = 24;
[...]
static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
u64 *phys_limit)
{
[...]
/*
* Optimistically try the zone that the physical address mask falls
* into first. If that returns memory that isn't actually addressable
* we will fallback to the next lower zone and try again.
*
* Note that GFP_DMA32 and GFP_DMA are no ops without the corresponding
* zones.
*/
if (*phys_limit <= DMA_BIT_MASK(zone_dma_bits))
return GFP_DMA;
if (*phys_limit <= DMA_BIT_MASK(32))
return GFP_DMA32;
return 0;
}


There are only a few architectures that override the limit:

powerpc:
/*
* Allow 30-bit DMA for very limited Broadcom wifi chips on many
* powerbooks.
*/
if (IS_ENABLED(CONFIG_PPC32))
zone_dma_bits = 30;
else
zone_dma_bits = 31;

s390:
zone_dma_bits = 31;

and arm64:
#define ARM64_ZONE_DMA_BITS 30
[...]
if (IS_ENABLED(CONFIG_ZONE_DMA)) {
zone_dma_bits = ARM64_ZONE_DMA_BITS;
arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS);
}


The actual categorization of page ranges into zones happens via
free_area_init_nodes() or free_area_init_node(); these are provided
with arrays of maximum physical addresses or zone sizes (depending on
which of them is called) by arch-specific code.
For arm64, the caller is zone_sizes_init(). X86 does it in zone_sizes_init().

> As to whitelisting all of dma-kmalloc -- I guess I can be talked into
> it. It still seems like the memory used for direct hardware
> communication shouldn't be exposed to userspace, but it we're dealing
> with packet data, etc, then it makes sense not to have to have bounce
> buffers, etc.

FWIW, as far as I understand, usercopy doesn't actually have any
effect on drivers that use the modern, proper APIs, since those don't
use the slab allocator at all - as I pointed out in my last mail, the
dma-kmalloc* slabs are used very rarely. (Which is good, because
putting objects from less-than-page-size slabs into iommu entries is a
terrible idea from a security and reliability perspective because it
gives the hardware access to completely unrelated memory.) Instead,
they get pages from the page allocator, and these pages may e.g. be
allocated from the DMA, DMA32 or NORMAL zones depending on the
restrictions imposed by hardware. So I think the usercopy restriction
only affects a few oddball drivers (like this s390 stuff), which is
why you're not seeing more bug reports caused by this.