Re: [PATCH 19/19] x86/dumpstack: print any pt_regs found on the stack

From: Andy Lutomirski
Date: Thu Jul 21 2016 - 18:32:59 EST


On Thu, Jul 21, 2016 at 2:21 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
> Now that we can find pt_regs registers in the middle of the stack due to
> an interrupt or exception, we can print them. Here's what it looks
> like:
>
> ...
> [<ffffffff8106f7dc>] do_async_page_fault+0x2c/0xa0
> [<ffffffff8189f558>] async_page_fault+0x28/0x30
> RIP: 0010:[<ffffffff814529e2>] [<ffffffff814529e2>] __clear_user+0x42/0x70
> RSP: 0018:ffff88007876fd38 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: 0000000000000138 RCX: 0000000000000138
> RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000061b640
> RBP: ffff88007876fd48 R08: 0000000dc2ced0d0 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000000 R12: 000000000061b640
> R13: 0000000000000000 R14: ffff880078770000 R15: ffff880079947200
> [<ffffffff814529e2>] ? __clear_user+0x42/0x70
> [<ffffffff814529c3>] ? __clear_user+0x23/0x70
> [<ffffffff81452a7b>] clear_user+0x2b/0x40
> ...

This looks wrong. Here are some theories:

(a) __clear_user is a reliable address that is indicated by RIP: ....
Then it's found again as an unreliable address as "?
__clear_user+0x42/0x70" by scanning the stack. "?
__clear_user+0x23/0x70" is a genuine leftover artifact on the stack.
In this case, shouldn't "? __clear_user+0x42/0x70" have been
suppressed because it matched a reliable address?

(b) You actually intended for all the addresses to be printed, in
which case "? __clear_user+0x42/0x70" should have been
"__clear_user+0x42/0x70" and you have a bug. In this case, it's
plausible that your state machine got a bit lost leading to "?
__clear_user+0x23/0x70" as well (i.e. it's not just an artifact --
it's a real frame and you didn't find it).

(c) Something else and I'm confused.

--Andy