Re: [PATCH] x86/orc: Don't bail on stack overflow

From: Andy Lutomirski
Date: Sat Nov 25 2017 - 13:26:53 EST


On Sat, Nov 25, 2017 at 9:28 AM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
> If we overflow the stack into a guard page and then try to unwind
> it with ORC, it should work perfectly: by construction, there can't
> be any meaningful data in the guard page because no writes to the
> guard page will have succeeded.
>
> ORC seems entirely capable of unwinding in this situation, except
> that it doesn't even try. Adjust its initial stack check so that
> it's willing to try unwinding.
>
> I tested this by intentionally overflowing the task stack. The
> result is an accurate call trace instead of a trace consisting
> purely of '?' entries.
>
> Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> ---
>
> Hi all-
>
> Ingo, this would have fixed half the debugging problem you had, I think.
> To really nail it, we'd want some kind of magic to annotate the trace
> so that page_fault (and async_page_fault) entries show CR2 and error_code.
>
> Josh, any ideas of how to do that cleanly? We could easily hard-code it
> in the OOPS unwinder, I guess.

Actually, this does pretty well. We don't get CR2, but, when I added
an intentional bug kind of along the lines of the one you debugged,
the intermediate page fault successfully dumps all the regs in the
stack trace, so we get the faulting instruction *and* the registers.
We also get ORIG_RAX, which tells us the error code. We could be
fancy and decode that.