Re: [PATCH 5/7] x86/mm/init: remove freed kernel image areas from alias mapping

From: Andy Lutomirski
Date: Sat Aug 04 2018 - 17:38:31 EST


On Thu, Aug 2, 2018 at 3:58 PM, Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> wrote:
>
> From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>
> The kernel image is mapped into two places in the virtual address
> space (addresses without KASLR, of course):
>
> 1. The kernel direct map (0xffff880000000000)
> 2. The "high kernel map" (0xffffffff81000000)
>
> We actually execute out of #2. If we get the address of a kernel
> symbol, it points to #2, but almost all physical-to-virtual
> translations point to #1.
>
> Parts of the "high kernel map" alias are mapped in the userspace
> page tables with the Global bit for performance reasons. The
> parts that we map to userspace do not (er, should not) have
> secrets.
>
> This is fine, except that some areas in the kernel image that
> are adjacent to the non-secret-containing areas are unused holes.
> We free these holes back into the normal page allocator and
> reuse them as normal kernel memory. The memory will, of course,
> get *used* via the normal map, but the alias mapping is kept.
>
> This otherwise unused alias mapping of the holes will, by default
> keep the Global bit, be mapped out to userspace, and be
> vulnerable to Meltdown.
>
> Remove the alias mapping of these pages entirely. This is likely
> to fracture the 2M page mapping the kernel image near these areas,
> but this should affect a minority of the area.
>
> The pageattr code changes *all* aliases mapping the physical pages
> that it operates on (by default). We only want to modify a single
> alias, so we need to tweak its behavior.
>
> This unmapping behavior is currently dependent on PTI being in
> place. Going forward, we should at least consider doing this for
> all configurations. Having an extra read-write alias for memory
> is not exactly ideal for debugging things like random memory
> corruption and this does undercut features like DEBUG_PAGEALLOC
> or future work like eXclusive Page Frame Ownership (XPFO).
>

I like this patch, and I tend to think we should (eventually) enable
it regardless of PTI. Cleaning up the memory map is generally a good
thing.

Also, just to make sure I fully understand: the kernel text is aliased
in both the direct map and the high map, right? This means that we
should be able to make the high kernel mapping have proper RO
permissions very early without breaking text_poke() at the minor cost
of needing to force a serializing instruction at the end of each big
block of text pokes. I think this would be worthwhile, although I
suspect we'll uncover *tons* of bugs in the process.