Re: [patch V2 19/29] x86/exceptions: Split debug IST stack

From: Sean Christopherson
Date: Fri Apr 05 2019 - 16:55:52 EST


On Fri, Apr 05, 2019 at 05:07:17PM +0200, Thomas Gleixner wrote:
> The debug IST stack is actually two separate debug stacks to handle #DB
> recursion. This is required because the CPU starts always at top of stack
> on exception entry, which means on #DB recursion the second #DB would
> overwrite the stack of the first.
>
> The low level entry code therefore adjusts the top of stack on entry so a
> secondary #DB starts from a different stack page. But the stack pages are
> adjacent without a guard page between them.
>
> Split the debug stack into 3 stacks which are separated by guard pages. The
> 3rd stack is never mapped into the cpu_entry_area and is only there to
> catch triple #DB nesting:
>
> --- top of DB_stack <- Initial stack
> --- end of DB_stack
> guard page
>
> --- top of DB1_stack <- Top of stack after entering first #DB
> --- end of DB1_stack
> guard page
>
> --- top of DB2_stack <- Top of stack after entering second #DB
> --- end of DB2_stack
> guard page
>
> If DB2 would not act as the final guard hole, a second #DB would point the
> top of #DB stack to the stack below #DB1 which would be valid and not catch
> the not so desired triple nesting.
>
> The backing store does not allocate any memory for DB2 and its guard page
> as it is not going to be mapped into the cpu_entry_area.
>
> - Adjust the low level entry code so it adjusts top of #DB with the offset
> between the stacks instead of exception stack size.
>
> - Make the dumpstack code aware of the new stacks.
>
> - Adjust the in_debug_stack() implementation and move it into the NMI code
> where it belongs. As this is NMI hotpath code, it just checks the full
> area between top of DB_stack and bottom of DB1_stack without checking
> for the guard page. That's correct because the NMI cannot hit a
> stackpointer pointing to the guard page between DB and DB1 stack. Even
> if it would, then the NMI operation still is unaffected, but the resume
> of the debug exception on the topmost DB stack will crash by touching
> the guard page.
>
> Suggested-by: Andy Lutomirski <luto@xxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> ---

...

> +static bool notrace is_debug_stack(unsigned long addr)
> +{
> + struct cea_exception_stacks *cs = __this_cpu_read(cea_exception_stacks);
> + unsigned long top = CEA_ESTACK_TOP(cs, DB);
> + unsigned long bot = CEA_ESTACK_BOT(cs, DB1);
> +
> + if (__this_cpu_read(debug_stack_usage))
> + return true;
> + /*
> + * Note, this covers the guard page between DB and DB1 as well to
> + * avoid two checks. But by all means @addr can never point into
> + * the guard page.
> + */
> + return addr > bot && addr < top;

Isn't this an off by one error? I.e. "return addr >= bot && addr < top".
%rsp == bot is technically still in the DB1 stack even though the next
PUSH/CALL will explode on the guard page.


> +}
> +NOKPROBE_SYMBOL(is_debug_stack);
> #endif
>
> dotraplinkage notrace void
> --- a/arch/x86/mm/cpu_entry_area.c
> +++ b/arch/x86/mm/cpu_entry_area.c
> @@ -98,10 +98,12 @@ static void __init percpu_setup_exceptio
>
> /*
> * The exceptions stack mappings in the per cpu area are protected
> - * by guard pages so each stack must be mapped separately.
> + * by guard pages so each stack must be mapped separately. DB2 is
> + * not mapped; it just exists to catch triple nesting of #DB.
> */
> cea_map_stack(DF);
> cea_map_stack(NMI);
> + cea_map_stack(DB1);
> cea_map_stack(DB);
> cea_map_stack(MCE);
> }
>
>