Re: sudo x86info -a => kernel BUG at mm/usercopy.c:78!

From: Andy Lutomirski
Date: Fri Mar 31 2017 - 14:57:46 EST


On Fri, Mar 31, 2017 at 11:03 AM, Dave Jones <davej@xxxxxxxxxxxxxxxxx> wrote:
> On Fri, Mar 31, 2017 at 10:32:04AM -0700, Kees Cook wrote:
>
> > > > > > > Full dmesg output here: https://pastebin.com/raw/Kur2mpZq
> > > > > > >
> > > > > > > [ 51.418954] usercopy: kernel memory exposure attempt detected from
> > > > > > > ffff880000090000 (dma-kmalloc-256) (4096 bytes)
> > > > > >
> > > > > > This seems like a real exposure: the copy is attempting to read 4096
> > > > > > bytes from a 256 byte object.
> > > > >
> > > > > The code[1] is doing a 4k read from /dev/mem in the range 0x90000 -> 0xa0000
> > > > > According to arch/x86/mm/init.c:devmem_is_allowed, that's still valid..
> > > > >
> > > > > Note that the printk is using the direct mapping address. Is that what's
> > > > > being passed down to devmem_is_allowed now ? If so, that's probably what broke.
> > > >
> > > > So this is attempting to read physical memory 0x90000 -> 0xa0000, but
> > > > that's somehow resolving to a virtual address that is claimed by
> > > > dma-kmalloc?? I'm confused how that's happening...
> > >
> > > /dev/mem is using physical addresses that the kernel translates through the
> > > direct mapping. __check_object_size seems to think that anything passed
> > > into it is always allocated by the kernel, but in this case, I think read_mem()
> > > is just passing through the direct mapping to copy_to_user.
> >
> > How is ffff880000090000 both in the direct mapping and a slab object?
> >
> > It would need to pass all of these checks, and be marked as PageSlab
> > before it could be evaluated by __check_heap_object:
> >
> > if (is_vmalloc_or_module_addr(ptr))
> > return NULL;
> >
> > if (!virt_addr_valid(ptr))
> > return NULL;
> >
> > page = virt_to_head_page(ptr);
> >
> > /* Check slab allocator for flags and size. */
> > if (PageSlab(page))
> > return __check_heap_object(ptr, n, page);
>
> Looking at Tommi's dmesg output closer, it appears that he's booting in
> EFI mode (which isn't unusual these days). I'm not sure that the EBDA
> (that x86info is trying to read) even exists under EFI, which is
> probably why the memory range is showing up as usable, and then ending
> up as a slab page, rather than being reserved by the BIOS.
>

This stuff all sucks. Presumably the only reason that we pay
attention to the EBDA at all in EFI mode is that no one has the guts
to change it: maybe there's a firmware out there that puts something
important in the EBDA and fails to properly reserve it in the EFI
memory map.

> ...
> reserve setup_data: [mem 0x0000000000059000-0x000000000009dfff] usable
> ...
>
> If EBDA under EFI isn't a valid thing, the puzzling part is why there's
> still an EBDA pointer in lowmem. x86 people ?
>
> Longterm, I think I'm just going to gut all the ebda code from x86info,
> as it isn't really necessary. Whether we still need to change /dev/mem
> to cope with this situation depends on whether there are other valid
> usecases.

I would like to at least consider a stricter alternative: make
/dev/mem a real whitelist. The rules would be that, by default,
/dev/mem access is always rejected. Kernel code could explicitly
register resources that would be permitted via /dev/mem -- each
resource would be tagged with a bit saying "devmem okay" along with
some indication of caching mode. For example, on very recent kernels,
some crappy HP tools are busted because they try to access SMBIOS
using explicit uncached devmem accesses, but that's verboten because
the kernel accesses it with ioremap_cache().

There are really very few cases where /dev/mem is okay at all, I
think. Maybe the EBDA is one of them. And we could make up some hack
where devmem access to certain ranges just gets all zeros regardless
of what's actually there.

--Andy