Re: [PATCH] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area.
From: Steve Wahl
Date: Tue Sep 10 2019 - 10:38:47 EST
On Tue, Sep 10, 2019 at 08:18:15AM +0200, Ingo Molnar wrote:
> * Kirill A. Shutemov <kirill@xxxxxxxxxxxxx> wrote:
> > On Fri, Sep 06, 2019 at 04:29:50PM -0500, Steve Wahl wrote:
> > > Our hardware (UV aka Superdome Flex) has address ranges marked
> > > reserved by the BIOS. These ranges can cause the system to halt if
> > > accessed.
> > >
> > > During kernel initialization, the processor was speculating into
> > > reserved memory causing system halts. The processor speculation is
> > > enabled because the reserved memory is being mapped by the kernel.
> > >
> > > The page table level2_kernel_pgt is 1 GiB in size, and had all pages
> > > initially marked as valid, and the kernel is placed anywhere in this
> > > range depending on the virtual address selected by KASLR. Later on in
> > > the boot process, the valid area gets trimmed back to the space
> > > occupied by the kernel.
> > >
> > > But during the interval of time when the full 1 GiB space was marked
> > > as valid, if the kernel physical address chosen by KASLR was close
> > > enough to our reserved memory regions, the valid pages outside the
> > > actual kernel space were allowing the processor to issue speculative
> > > accesses to the reserved space, causing the system to halt.
> > >
> > > This was encountered somewhat rarely on a normal system boot, and
> > > somewhat more often when starting the crash kernel if
> > > "crashkernel=512M,high" was specified on the command line (because
> > > this heavily restricts the physical address of the crash kernel,
> > > usually to within 1 GiB of our reserved space).
> > >
> > > The answer is to invalidate the pages of this table outside the
> > > address range occupied by the kernel before the page table is
> > > activated. This patch has been validated to fix this problem on our
> > > hardware.
> > If the goal is to avoid *any* mapping of the reserved region to stop
> > speculation, I don't think this patch will do the job. We still (likely)
> > have the same memory mapped as part of the identity mapping. And it
> > happens at least in two places: here and before on decompression stage.
> Yeah, this really needs a fix at the KASLR level: it should only ever map
> into regions that are fully RAM backed.
> Is the problem that the 1 GiB mapping is a direct mapping, which can be
> speculated into? I presume KASLR won't accidentally map the kernel into
> the reserved region, right?
I believe you are correct. There is code that limits KASLR's choice
of phyiscal addresses to valid RAM locations. There are no bugs in it
that I've seen.
It's just that the 1G mapping includes wide regions of physical
address space on either or both sides of the chosen physical space for
the kernel, which are not limited to valid RAM regions, allowing
speculative accesses into reserved regions if the chosen kernel
physical address is close enough to one of them.
--> Steve Wahl
Steve Wahl, Hewlett Packard Enterprise