Re: [PATCH v2 1/2] x86/asm: Pin sensitive CR4 bits
From: Kees Cook
Date: Fri Jun 14 2019 - 23:24:30 EST
On Fri, Jun 14, 2019 at 04:57:36PM +0200, Thomas Gleixner wrote:
> Wouldn't it make sense to catch situations where a regular caller provides
> @val with pinned bits unset? I.e. move the OR into this code path after
> storing bits_missing.
I mentioned this in the commit log, but maybe I wasn't very clear:
> > Since these bits are global state (once established by the boot CPU
> > and kernel boot parameters), they are safe to write to secondary CPUs
> > before those CPUs have finished feature detection. As such, the bits are
> > written with an "or" performed before the register write as that is both
> > easier and uses a few bytes less storage of a location we don't have:
> > read-only per-CPU data. (Note that initialization via cr4_init_shadow()
> > isn't early enough to avoid early native_write_cr4() calls.)
Basically, there are calls of native_write_cr4() made very early in
secondary CPU init that would hit the "eek missing bit" case, and using
the cr4_init_shadow() trick mentioned by Linus still wasn't early
enough. So, since feature detection for these bits is "done" (i.e.
secondary CPUs will match the boot CPU's for these bits), it's safe to
set them "early". To avoid this, we'd need a per-cpu "am I ready to set
these bits?" state, and it'd need to be read-only-after-init, which is
not a section that exists and seems wasteful to create (4095 bytes
unused) for this feature alone.
> Something like this:
>
> unsigned long bits_missing = 0;
>
> again:
> asm volatile("mov %0,%%cr4": "+r" (val), "+m" (cr4_pinned_bits));
>
> if (static_branch_likely(&cr_pinning)) {
> if (unlikely((val & cr4_pinned_bits) != cr4_pinned_bits)) {
> bits_missing = ~val & cr4_pinned_bits;
> val |= cr4_pinned_bits;
> goto again;
> }
> /* Warn after we've set the missing bits. */
> WARN_ONCE(bits_missing, "CR4 bits went missing: %lx!?\n",
> bits_missing);
> }
Yup, that's actually exactly what I had tried. Should I try to track down
the very early callers and OR in the bits at the call sites instead? (This
had occurred to me also, but it seemed operationally equivalent to ORing
at the start of native_write_cr4(), so I didn't even bother trying it).
--
Kees Cook