Re: [PATCH 19/19] x86/dumpstack: print any pt_regs found on the stack

From: Josh Poimboeuf
Date: Thu Jul 21 2016 - 23:30:40 EST

On Thu, Jul 21, 2016 at 03:32:32PM -0700, Andy Lutomirski wrote:
> On Thu, Jul 21, 2016 at 2:21 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> > Now that we can find pt_regs registers in the middle of the stack due to
> > an interrupt or exception, we can print them. Here's what it looks
> > like:
> >
> > ...
> > [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
> > [<ffffffff8189f558>] async_page_fault+0x28/0x30
> > RIP: 0010:[<ffffffff814529e2>] [<ffffffff814529e2>] __clear_user+0x42/0x70
> > RSP: 0018:ffff88007876fd38 EFLAGS: 00010202
> > RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138
> > RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640
> > RBP: ffff88007876fd48 R08: 0000000dc2ced0d0 R09: 0000000000000000
> > R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640
> > R13: 0000000000000000 R14: ffff880078770000 R15: ffff880079947200
> > [<ffffffff814529e2>] ? __clear_user+0x42/0x70
> > [<ffffffff814529c3>] ? __clear_user+0x23/0x70
> > [<ffffffff81452a7b>] clear_user+0x2b/0x40
> > ...
> This looks wrong. Here are some theories:
> (a) __clear_user is a reliable address that is indicated by RIP: ....
> Then it's found again as an unreliable address as "?
> __clear_user+0x42/0x70" by scanning the stack. "?
> __clear_user+0x23/0x70" is a genuine leftover artifact on the stack.
> In this case, shouldn't "? __clear_user+0x42/0x70" have been
> suppressed because it matched a reliable address?
> (b) You actually intended for all the addresses to be printed, in
> which case "? __clear_user+0x42/0x70" should have been
> "__clear_user+0x42/0x70" and you have a bug. In this case, it's
> plausible that your state machine got a bit lost leading to "?
> __clear_user+0x23/0x70" as well (i.e. it's not just an artifact --
> it's a real frame and you didn't find it).
> (c) Something else and I'm confused.

So there's a subtle difference between addresses reported by regs->ip
and normal return addresses. For example:

[<ffffffff8189ff4d>] smp_apic_timer_interrupt+0x3d/0x50
[<ffffffff8189de6e>] apic_timer_interrupt+0x9e/0xb0
RIP: 0010:[<ffffffff8129b350>] [<ffffffff8129b350>] path_init+0x0/0x750
RSP: 0018:ffff880036a3fd80 EFLAGS: 00000296
RAX: ffff88003691aa40 RBX: ffff880036a3ff08 RCX: ffff880036a3ff08
RDX: ffff880036a3ff08 RSI: 0000000000000041 RDI: ffff880036a3fdb0
RBP: ffff880036a3fda0 R08: 0000000000000000 R09: 0000000000000010
R10: 8080808080808080 R11: fefefefefefefeff R12: ffff880036a3fdb0
R13: 0000000000000001 R14: ffff880036a3ff08 R15: 0000000000000000
[<ffffffff8129b350>] ? lookup_fast+0x3d0/0x3d0
[<ffffffff8129c81b>] ? path_lookupat+0x1b/0x120
[<ffffffff8129ddd1>] filename_lookup+0xb1/0x180

In this case the irq hit right after path_lookupat() called into
path_init(). So the "path_init+0x0" printed by __show_regs() is right.

Note the backtrace reports the same address, but it instead describes it
as "lookup_fast+0x3d0", which is the end of lookup_fast(). That's
because normally, such an address after a call instruction at the end of
a function would indicate a tail call (e.g., to a noreturn function).
If that were the case, printing "path_init+0x0" would be completely
wrong, because path_init() just happens to be the function located
immediately after lookup_fast().

Maybe I could add some special logic to say: "if this return address was
from a call, use printk_stack_address(); else if it was from a pt_regs,
use printk_address()." (The former prints the preceding function, the
latter prints the current function.) Then we could remove the question

There's also the question of whether or not the address should be
printed again, after it's already been printed by __show_regs(). I
don't have a strong opinion either way.