Re: [PATCH 1/3] x86/entry/64: Refactor IRQ stacks and make then NMI-safe

From: Andy Lutomirski
Date: Fri Jul 24 2015 - 02:09:12 EST


On Thu, Jul 23, 2015 at 3:37 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> This will allow IRQ stacks to nest inside NMIs or similar entries
> that can happen during IRQ stack setup or teardown.
>
> The Xen code here has a confusing comment.
>
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> ---
> arch/x86/entry/entry_64.S | 72 ++++++++++++++++++++++++++------------------
> arch/x86/kernel/cpu/common.c | 2 +-
> arch/x86/kernel/process_64.c | 4 +++
> 3 files changed, 47 insertions(+), 31 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index d3033183ed70..5f7df8949fa7 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -491,6 +491,39 @@ ENTRY(irq_entries_start)
> END(irq_entries_start)

This code is much more subtle than I thought when I wrote it:

>
> /*
> + * Enters the IRQ stack if we're not already using it. NMI-safe. Clobbers
> + * flags and puts old RSP into old_rsp, and leaves all other GPRs alone.
> + * Requires kernel GSBASE.
> + *
> + * The invariant is that, if irq_count != 0, then we're either on the
> + * IRQ stack or an IST stack, even if an NMI interrupts IRQ stack entry
> + * or exit.
> + */
> +.macro ENTER_IRQ_STACK old_rsp
> + movq %rsp, \old_rsp
> + cmpl $0, PER_CPU_VAR(irq_count)
> + jne 694f
> + movq PER_CPU_VAR(irq_stack_ptr), %rsp
> + /*
> + * Right now, we're on the irq stack with irq_count == 0. A nested
> + * IRQ stack switch could clobber the stack. That's fine: the stack
> + * is empty.
> + */

A nested ENTER_IRQ_STACK/LEAVE_IRQ_STACK pair is fine here. Anything
else that does PUSH (or non-IST interrupt delivery) right here is not
safe because something could interrupt *that* and do ENTER_IRQ_STACK,
thus clobbering whatever got pushed here.

In a world populated by sane people, the only things that can
interrupt here are a vmalloc fault (let's just kill that), NMI, or
MCE. But we're insane and we're talking about removing breakpoints
from the IST stack and even returning from IST entries using RET,
either of which will write something to (%rsp) and expect it not to
get clobbered.

We can't interchange the incl and the movq, because then we aren't
safe against nested ENTER_IRQ_STACK.

To be obviously safe against any local exception, we want a single
instruction that will change %rsp and some in-memory flag at the same
time. There aren't a whole lot of candidates. Cmpxchg isn't useful
(cmpxchg with a memory operand doesn't modify its register operand).
xchg could plausibly be abused to work, but it's slow because it's
always atomic. Enter isn't going to work without a window in which
rsp contains something bogus.

Xadd, on the other hand, just might work.

We need two percpu variables: irq_stack_ptr and irq_stack_flag.
irq_stack_ptr points to the IRQ stack and isn't modified.
irq_stack_flag == irq_stack_ptr if we're on the IRQ stack and
irq_stack_flag has any other value if we're not on the IRQ stack.
Then algebra happens. Unfortunately, the best I came up with so far
uses xadd to enter the IRQ stack and xchg to leave.

https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/entry_ist&id=36825112b6082f2711605647366cd682a6be678a

I don't love it because it probably adds 60 cycles or so to IRQs. On
the other hand, it was fun to write.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/