[PATCH] x86/orc: Don't bail on stack overflow

From: Andy Lutomirski
Date: Sat Nov 25 2017 - 12:28:42 EST


If we overflow the stack into a guard page and then try to unwind
it with ORC, it should work perfectly: by construction, there can't
be any meaningful data in the guard page because no writes to the
guard page will have succeeded.

ORC seems entirely capable of unwinding in this situation, except
that it doesn't even try. Adjust its initial stack check so that
it's willing to try unwinding.

I tested this by intentionally overflowing the task stack. The
result is an accurate call trace instead of a trace consisting
purely of '?' entries.

Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
---

Hi all-

Ingo, this would have fixed half the debugging problem you had, I think.
To really nail it, we'd want some kind of magic to annotate the trace
so that page_fault (and async_page_fault) entries show CR2 and error_code.

Josh, any ideas of how to do that cleanly? We could easily hard-code it
in the OOPS unwinder, I guess.

arch/x86/kernel/unwind_orc.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c
index a3f973b2c97a..7f6e3935666b 100644
--- a/arch/x86/kernel/unwind_orc.c
+++ b/arch/x86/kernel/unwind_orc.c
@@ -553,8 +553,18 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task,
}

if (get_stack_info((unsigned long *)state->sp, state->task,
- &state->stack_info, &state->stack_mask))
- return;
+ &state->stack_info, &state->stack_mask)) {
+ /*
+ * We weren't on a valid stack. It's possible that
+ * we overflowed a valid stack into a guard page.
+ * See if the next page up is valid so that we can
+ * generate some kind of backtrace if this happens.
+ */
+ void *next_page = (void *)PAGE_ALIGN((unsigned long)regs->sp);
+ if (get_stack_info(next_page, state->task, &state->stack_info,
+ &state->stack_mask))
+ return;
+ }

/*
* The caller can provide the address of the first frame directly
--
2.13.6