Re: [PATCH 10/19] x86/dumpstack: add get_stack_info() interface

From: Andy Lutomirski
Date: Fri Jul 22 2016 - 19:52:35 EST

On Fri, Jul 22, 2016 at 4:26 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Thu, Jul 21, 2016 at 2:21 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>> valid_stack_ptr() is buggy: it assumes that all stacks are of size
>> THREAD_SIZE, which is not true for exception stacks. So the
>> walk_stack() callbacks will need to know the location of the beginning
>> of the stack as well as the end.
>> Another issue is that in general the various features of a stack (type,
>> size, next stack pointer, description string) are scattered around in
>> various places throughout the stack dump code.
> I finally figured out what visit_info is. But would it make more
> sense to track it in the unwind state so that the unwinder can
> directly make sure it doesn't start looping?

I just realized that it *is* in the unwind state. But maybe this code
in update_stack_state:

sp = info->next;
if (!sp).
goto unknown;

if (get_stack_info(sp, state->task, info, &state->stack_mask))
goto unknown;

if (!on_stack(info, addr, len))
goto unknown;

should do something like:

if (get_stack_info(addr, ...))
goto unknown.

sp = info->end;

instead. Alternatively, maybe it would make sense to keep sp as is
(have update_stack_state return bool instead of returning a pointer)
so that a frame that switches stacks still shows the actual sp at the
time that the frame called whatever the it called.

I'm really quite confused by what state->sp means in your current
code. In the non-stack-switching case (everything is on the thread
stack), it appears to always match state->bp. Am I missing something?
If I'm understanding this correctly, when you're pointing at a call
frame, state->bp is that frame's base address (the top of the stack
frame), unwind_get_return_address() returns the address to which that
frame would return, and, in the future, unwind_get_gpr(UNWIND_DI) or
whatever it ends up looking like will return RDI at the time that the
frame called whatever function it called, if known. By that logic,
shouldn't state->sp be sp on entry to the call instruction? (Or could
sp just be removed? Does it do anything?)

I guess the reason I'm still not 100% comfortable with the idea that
pt_regs frames don't exist a real frames is that I keep waffling as to
how I should think about the regs associated with a frame in the
future DWARF world. I think I imagine them being the regs at the time
that the frame did it's call to the next frame, which, by an
admittedly weak analogy, means that the pt_regs state would be the
regs at the time that the exception or interrupt happened. That
offers a third silly option for dealing with the annoying '?': emit
two frames for a pt_regs, but have the frame in the entry code show
NULL for its return address because it's not a normal return.

> And please remove test_and_set_bit() -- it's pointlessly slow.
>> +static bool in_hardirq_stack(unsigned long *stack, struct stack_info *info,
>> + unsigned long *visit_mask)
>> +{
>> + unsigned long *begin = (unsigned long *)this_cpu_read(hardirq_stack);
>> + unsigned long *end = begin + (THREAD_SIZE / sizeof(long));
>> +
>> + if (stack < begin || stack >= end)
>> + return false;
>> +
>> + if (visit_mask && test_and_set_bit(STACK_TYPE_IRQ, visit_mask))
>> + return false;
>> +
>> + info->type = STACK_TYPE_IRQ;
>> + info->begin = begin;
>> + info->end = end;
>> + info->next = (unsigned long *)*begin;
> This works, but it's a bit magic. I don't suppose we could get rid of
> this ->next thing entirely and teach show_stack_log_lvl(), etc. to
> move from stack to stack by querying the stack type of whatever the
> frame base address is if the frame base address ends up being out of
> bounds for the current stack? Or maybe the unwinder could even do
> this by itself.
>> +static bool in_exception_stack(unsigned long *s, struct stack_info *info,
>> + unsigned long *visit_mask)
>> {
>> unsigned long stack = (unsigned long)s;
>> unsigned long begin, end;
>> @@ -44,55 +63,62 @@ static unsigned long *in_exception_stack(unsigned long *s, char **name,
>> if (stack < begin || stack >= end)
>> continue;
>> - if (test_and_set_bit(k, visit_mask))
>> + if (visit_mask &&
>> + test_and_set_bit(STACK_TYPE_EXCEPTION + k, visit_mask))
>> return false;
>> - *name = exception_stack_names[k];
>> - return (unsigned long *)end;
>> + info->type = STACK_TYPE_EXCEPTION + k;
>> + info->begin = (unsigned long *)begin;
>> + info->end = (unsigned long *)end;
>> + info->next = (unsigned long *)info->end[-2];
> This is so magical that I don't immediately see why it's correct.
> Presumably it's because the thing two slots down from the top of the
> stack is regs->sp? If so, that needs a comment.
> But again, couldn't we use the fact that we now know how to decode
> pt_regs to avoid needing this? I can imagine it being useful as a
> fallback in the event that the unwinder fails, but this is just a
> fallback. Also, NMI is weird and I'm wondering whether this works at
> all when trying to unwind from a looped NMI.
> Fixing this up could be a followup after this series is in, I think --
> you're preserving existing behavior AFAICS. I just don't particularly
> like the existing behavior.
> FWIW, I think this code needs to be explicitly tested for the 32-bit
> double fault case. It's highly magical.

Andy Lutomirski
AMA Capital Management, LLC