Re: [PATCH 10/19] x86/dumpstack: add get_stack_info() interface

From: Josh Poimboeuf
Date: Fri Jul 22 2016 - 19:55:09 EST


On Fri, Jul 22, 2016 at 04:26:46PM -0700, Andy Lutomirski wrote:
> On Thu, Jul 21, 2016 at 2:21 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> > valid_stack_ptr() is buggy: it assumes that all stacks are of size
> > THREAD_SIZE, which is not true for exception stacks. So the
> > walk_stack() callbacks will need to know the location of the beginning
> > of the stack as well as the end.
> >
> > Another issue is that in general the various features of a stack (type,
> > size, next stack pointer, description string) are scattered around in
> > various places throughout the stack dump code.
>
> I finally figured out what visit_info is. But would it make more
> sense to track it in the unwind state so that the unwinder can
> directly make sure it doesn't start looping?

Well, the unwinders aren't the only users of get_stack_info() and the
visit_mask. show_trace_log_lvl() also uses it.

But it would probably be cleaner to at least do the visit_mask bit
testing/setting in get_stack_info() rather than in the in_*_stack()
functions.

> And please remove test_and_set_bit() -- it's pointlessly slow.

Ok.

>
> > +static bool in_hardirq_stack(unsigned long *stack, struct stack_info *info,
> > + unsigned long *visit_mask)
> > +{
> > + unsigned long *begin = (unsigned long *)this_cpu_read(hardirq_stack);
> > + unsigned long *end = begin + (THREAD_SIZE / sizeof(long));
> > +
> > + if (stack < begin || stack >= end)
> > + return false;
> > +
> > + if (visit_mask && test_and_set_bit(STACK_TYPE_IRQ, visit_mask))
> > + return false;
> > +
> > + info->type = STACK_TYPE_IRQ;
> > + info->begin = begin;
> > + info->end = end;
> > + info->next = (unsigned long *)*begin;
>
> This works, but it's a bit magic. I don't suppose we could get rid of
> this ->next thing entirely and teach show_stack_log_lvl(), etc. to
> move from stack to stack by querying the stack type of whatever the
> frame base address is if the frame base address ends up being out of
> bounds for the current stack? Or maybe the unwinder could even do
> this by itself.

I'm not quite sure what you mean here. The ->next stack pointer is
quite useful and it abstracts that ugliness away from the callers of
get_stack_info(). I'm open to any specific suggestions.

>
> > +static bool in_exception_stack(unsigned long *s, struct stack_info *info,
> > + unsigned long *visit_mask)
> > {
> > unsigned long stack = (unsigned long)s;
> > unsigned long begin, end;
> > @@ -44,55 +63,62 @@ static unsigned long *in_exception_stack(unsigned long *s, char **name,
> > if (stack < begin || stack >= end)
> > continue;
> >
> > - if (test_and_set_bit(k, visit_mask))
> > + if (visit_mask &&
> > + test_and_set_bit(STACK_TYPE_EXCEPTION + k, visit_mask))
> > return false;
> >
> > - *name = exception_stack_names[k];
> > - return (unsigned long *)end;
> > + info->type = STACK_TYPE_EXCEPTION + k;
> > + info->begin = (unsigned long *)begin;
> > + info->end = (unsigned long *)end;
> > + info->next = (unsigned long *)info->end[-2];
>
> This is so magical that I don't immediately see why it's correct.
> Presumably it's because the thing two slots down from the top of the
> stack is regs->sp? If so, that needs a comment.

Heck if I know, I just stole it from dump_trace() ;-)

I'll figure it out and add a comment.

> But again, couldn't we use the fact that we now know how to decode
> pt_regs to avoid needing this? I can imagine it being useful as a
> fallback in the event that the unwinder fails, but this is just a
> fallback.

Yeah, this is needed as a fallback. But I wouldn't call it "just" a
fallback: the stack dump code *needs* to be able to still traverse the
stacks if frame pointers fail.

> Also, NMI is weird and I'm wondering whether this works at
> all when trying to unwind from a looped NMI.

Unless I'm missing something, I think it should be fine for nested NMIs,
since they're all on the same stack. I can try to test it. What in
particular are you worried about?

> Fixing this up could be a followup after this series is in, I think --
> you're preserving existing behavior AFAICS. I just don't particularly
> like the existing behavior.
>
> FWIW, I think this code needs to be explicitly tested for the 32-bit
> double fault case. It's highly magical.

Ok, I'll test it.

--
Josh