Re: [patch V2 04/25] x86/apic: Make apic_pending_intr_clear() more robust

From: Andy Lutomirski
Date: Fri Jul 05 2019 - 16:40:10 EST


On Fri, Jul 5, 2019 at 1:36 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Fri, 5 Jul 2019, Andy Lutomirski wrote:
> > On Fri, Jul 5, 2019 at 8:47 AM Andrew Cooper <andrew.cooper3@xxxxxxxxxx> wrote:
> > > Because TPR is 0, an incoming IPI can trigger #AC, #CP, #VC or #SX
> > > without an error code on the stack, which results in a corrupt pt_regs
> > > in the exception handler, and a stack underflow on the way back out,
> > > most likely with a fault on IRET.
> > >
> > > These can be addressed by setting TPR to 0x10, which will inhibit
> > > delivery of any errant IPIs in this range, but some extra sanity logic
> > > may not go amiss. An error code on a 64bit stack can be spotted with
> > > `testb $8, %spl` due to %rsp being aligned before pushing the exception
> > > frame.
> >
> > Several years ago, I remember having a discussion with someone (Jan
> > Beulich, maybe?) about how to efficiently make the entry code figure
> > out the error code situation automatically. I suspect it was on IRC
> > and I can't find the logs. I'm thinking that maybe we should just
> > make Linux's idtentry code do something like this.
> >
> > If nothing else, we could make idtentry do:
> >
> > testl $8, %esp /* shorter than testb IIRC */
> > jz 1f /* or jnz -- too lazy to figure it out */
> > pushq $-1
> > 1:
>
> Errm, no. We should not silently paper over it. If we detect that this came
> in with a wrong stack frame, i.e. not from a CPU originated exception, then
> we truly should yell loud. Also in that case you want to check the APIC:ISR
> and issue an EOI to clear it.

It gives us the option to replace idtentry with something
table-driven. I don't think I love it, but it's not an awful idea.



>
> > > Another interesting problem is an IPI which its vector 0x80. A cunning
> > > attacker can use this to simulate system calls from unsuspecting
> > > positions in userspace, or for interrupting kernel context. At the very
> > > least the int0x80 path does an unconditional swapgs, so will try to run
> > > with the user gs, and I expect things will explode quickly from there.
> >
> > At least SMAP helps here on non-FSGSBASE systems. With FSGSBASE, I
>
> How does it help? It still crashes the kernel.
>
> > suppose we could harden this by adding a special check to int $0x80 to
> > validate GSBASE.
>
> > > One option here is to look at ISR and complain if it is found to be set.
> >
> > Barring some real hackery, we're toast long before we get far enough to
> > do that.
>
> No. We can map the APIC into the user space visible page tables for PTI
> without compromising the PTI isolation and it can be read very early on
> before SWAPGS. All you need is a register to clobber not more. It the ISR
> is set, then go into an error path, yell loudly, issue EOI and return.
> The only issue I can see is: It's slow :)
>
>

I think this will be really extremely slow. If we can restrict this
to x2apic machines, then maybe it's not so awful.

FWIW, if we just patch up the GS thing, then we are still vulnerable:
the bad guy can arrange for a privileged process to have register
state corresponding to a dangerous syscall and then send an int $0x80
via the APIC.