Re: [PATCH] x86: Use entire page for the per-cpu GDT only if paravirt-enabled

From: Andy Lutomirski
Date: Tue Sep 29 2015 - 13:36:11 EST


On Sep 29, 2015 2:01 AM, "Ingo Molnar" <mingo@xxxxxxxxxx> wrote:
>
>
> * Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
>
> > On 09/28/2015 09:58 AM, Ingo Molnar wrote:
> > >
> > > * Denys Vlasenko <dvlasenk@xxxxxxxxxx> wrote:
> > >
> > >> On 09/26/2015 09:50 PM, H. Peter Anvin wrote:
> > >>> NAK. We really should map the GDT read-only on all 64 bit systems,
> > >>> since we can't hide the address from SLDT. Same with the IDT.
> > >>
> > >> Sorry, I don't understand your point.
> > >
> > > So the problem is that right now the SGDT instruction (which is unprivileged)
> > > leaks the real address of the kernel image:
> > >
> > > fomalhaut:~> ./sgdt
> > > SGDT: ffff88303fd89000 / 007f
> > >
> > > that 'ffff88303fd89000' is a kernel address.
> >
> > Thank you.
> > I do know that SGDT and friends are unprivileged on x86
> > and thus they allow userspace (and guest kernels in paravirt)
> > learn things they don't need to know.
> >
> > I don't see how making GDT page-aligned and page-sized
> > changes anything in this regard. SGDT will still work,
> > and still leak GDT address.
>
> Well, as I try to explain it in the other part of my mail, doing so enables us to
> remap the GDT to a less security sensitive virtual address that does not leak the
> kernel's randomized address:
>
> > > Your observation in the changelog and your patch:
> > >
> > >>>> It is page-sized because of paravirt. [...]
> > >
> > > ... conflicts with the intention to mark (remap) the primary GDT address read-only
> > > on native kernels as well.
> > >
> > > So what we should do instead is to use the page alignment properly and remap the
> > > GDT to a read-only location, and load that one.
> >
> > If we'd have a small GDT (i.e. what my patch does), we still can remap the
> > entire page which contains small GDT, and simply don't care that some other data
> > is also visible through that RO page.
>
> That's generally considered fragile: suppose an attacker has a limited information
> leak that can read absolute addresses with system privilege but he doesn't know
> the kernel's randomized base offset. With a 'partial page' mapping there could be
> function pointers near the GDT, part of the page the GDT happens to be on, that
> leak this information.
>
> (Same goes for crypto keys or other critical information (like canary information,
> salts, etc.) accidentally ending up nearby.)
>
> Arguably it's a bit tenuous, but when playing remapping games it's generally
> considered good to be page aligned and page sized, with zero padding.
>
> > > This would have a couple of advantages:
> > >
> > > - This would give kernel address randomization more teeth on x86.
> > >
> > > - An additional advantage would be that rootkits overwriting the GDT would have
> > > a bit more work to do.
> > >
> > > - A third advantage would be that for NUMA systems we could 'mirror' the GDT into
> > > node-local memory and load those. This makes GDT load cache-misses a bit less
> > > expensive.
> >
> > GDT is per-cpu. Isn't per-cpu memory already NUMA-local?
>
> Indeed it is:
>
> fomalhaut:~> for ((cpu=1; cpu<9; cpu++)); do taskset $cpu ./sgdt ; done
> SGDT: ffff88103fa09000 / 007f
> SGDT: ffff88103fa29000 / 007f
> SGDT: ffff88103fa29000 / 007f
> SGDT: ffff88103fa49000 / 007f
> SGDT: ffff88103fa49000 / 007f
> SGDT: ffff88103fa49000 / 007f
> SGDT: ffff88103fa29000 / 007f
> SGDT: ffff88103fa69000 / 007f
>
> I confused it with the IDT, which is still global.
>
> This also means that the GDT in itself does not leak kernel addresses at the
> moment, except it leaks the layout of the percpu area.
>
> So my suggestion would be to:
>
> - make the GDT unconditionally page aligned and sized, then remap it to a
> read-only address unconditionally as well, like we do it for the IDT.

Does anyone know what happens if you stick a non-accessed segment in
the GDT, map the GDT RO, and access it? The docs are extremely vague
on the interplay between segmentation and paging on the segmentation
structures themselves. My guess is that it causes #PF. This might
break set_thread_area users unless we change set_thread_area to force
the accessed bit on.

There's a possible worse failure mode: if someone pokes an un-accessed
segment into SS or CS using sigreturn, then it's within the realm of
possibility that IRET would generate #PF (hey Intel and AMD, please
document this!). I don't think that would be rootable, but at the
very least we'd want to make sure it doesn't OOPS by either making it
impossible or adding an explicit test to sigreturn.c.

hpa pointed out in another thread that the GDT *must* be writable on
32-bit kernels because we use a task gate for NMI and jumping through
a task gate writes to the GDT.

On another note, SGDT is considerably faster than LSL, at least on
Sandy Bridge. The vdso might be able to take advantage of that for
getcpu.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/