Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers

From: Ingo Molnar
Date: Sun Apr 27 2014 - 08:09:15 EST



* Richard Yao <ryao@xxxxxxxxxx> wrote:

> Stack traces are generated by scanning the stack and interpeting
> anything that looks like it could be a pointer to something. We do
> not need to do this when we have frame pointers, but we do it
> anyway, with the distinction that we use the return pointers to mark
> actual frames by the absence of a question mark.
>
> The additional verbosity of stack scanning tends to bombard us with
> walls of text for no gain in practice, so lets switch to printing
> only stack frames when frame pointers are available. That we can
> spend less time reading stack traces and more time looking at code.
>
> Signed-off-by: Richard Yao <ryao@xxxxxxxxxx>
> ---
> arch/x86/kernel/dumpstack.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> index d9c12d3..94ffe06 100644
> --- a/arch/x86/kernel/dumpstack.c
> +++ b/arch/x86/kernel/dumpstack.c
> @@ -162,7 +162,11 @@ static void print_trace_address(void *data, unsigned long addr, int reliable)
> static const struct stacktrace_ops print_trace_ops = {
> .stack = print_trace_stack,
> .address = print_trace_address,
> +#ifdef CONFIG_FRAME_POINTER
> + .walk_stack = print_context_stack_bp,
> +#else
> .walk_stack = print_context_stack,
> +#endif
> };

I don't really like this patch, and this question comes up often, so
let me try to explain it more verbosely.

Lets take a real-life example of a BP-only trace about a lockup:

Call Trace:
[<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
[<7916a34c>] do_raw_spin_lock+0xb7/0xe8
[<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
[<790391e8>] prepare_to_wait+0x18/0x57
[<792b7158>] __wait_on_bit+0x20/0x5f
[<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
[<790e4eff>] __wait_on_buffer+0x1b/0x1e
[<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
[<7911015f>] ext3_iget+0x45/0x346
[<79114e03>] ext3_lookup+0x97/0xa6
[<790c9c0e>] lookup_real+0x20/0x35
[<790ca1d2>] __lookup_hash+0x2a/0x31
[<790caa0e>] lookup_slow+0x36/0x8c
[<790cbdda>] path_lookupat+0xf9/0x5c5
[<790cc2c5>] filename_lookup+0x1f/0x84
[<790cde98>] user_path_at_empty+0x3f/0x65
[<790cdecb>] user_path_at+0xd/0xf
[<790c6585>] vfs_fstatat+0x40/0x88
[<790c65f8>] vfs_stat+0x13/0x15
[<790c67e2>] sys_stat64+0x11/0x22
[<792b9d04>] syscall_call+0x7/0xb

What does this call trace tell us? That we locked up somewhere in the
ext3 code. That's all that we know.

Now lets take a look at a 'verbose' entry of the same lockup, that
also outputs other entries that 'look like' kernel text addresses:

Call Trace:
[<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
[<7916a34c>] do_raw_spin_lock+0xb7/0xe8
[<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
[<790391e8>] ? prepare_to_wait+0x18/0x57
[<790391e8>] prepare_to_wait+0x18/0x57
[<7914a320>] ? generic_make_request+0x80/0xb5
[<790e4f30>] ? unmap_underlying_metadata+0x2e/0x2e
[<792b7158>] __wait_on_bit+0x20/0x5f
[<7914a427>] ? submit_bio+0xd2/0xfb
[<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
[<790e4f30>] ? unmap_underlying_metadata+0x2e/0x2e
[<79039086>] ? autoremove_wake_function+0x31/0x31
[<790e4eff>] __wait_on_buffer+0x1b/0x1e
[<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
[<7911015f>] ext3_iget+0x45/0x346
[<79114e03>] ext3_lookup+0x97/0xa6
[<790c9c0e>] lookup_real+0x20/0x35
[<790ca1d2>] __lookup_hash+0x2a/0x31
[<790caa0e>] lookup_slow+0x36/0x8c
[<790cbdda>] path_lookupat+0xf9/0x5c5
[<790cc2c5>] filename_lookup+0x1f/0x84
[<790cde98>] user_path_at_empty+0x3f/0x65
[<790cdecb>] user_path_at+0xd/0xf
[<790c6585>] vfs_fstatat+0x40/0x88
[<7903f844>] ? lg_local_unlock+0x31/0x47
[<790c65f8>] vfs_stat+0x13/0x15
[<790c67e2>] sys_stat64+0x11/0x22
[<790c3c47>] ? __fput+0x187/0x1a0
[<792b9d37>] ? restore_all+0xf/0xf
[<79165bb4>] ? trace_hardirqs_on_thunk+0xc/0x10
[<792b9d04>] syscall_call+0x7/0xb

Firstly, it's still easy to tell the true backtrace: ignore all the
'?' entries.

But the '?' entries also tell us something more: that recently this
CPU submitted IO. That might or might not be suspected from the
original trace, but from the 'extended' trace it's really obvious that
IO activity happened before the lockup as well.

There were many other examples in the past where '?' entries gave us
clues about the nature of a bug: they represent a poor man's trace of
what happened recently on that CPU.

So it's useful information for hairy bugs and it would be sad to
remove them.

Having said that, your complaint that '?' entries can make reading of
back traces more difficult is valid as well - so maybe we can do
something about that.

For example if we used more horizontal separation in the output
format:

Call Trace:
[<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
[<7916a34c>] do_raw_spin_lock+0xb7/0xe8
[<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
[<790391e8>] # ? prepare_to_wait+0x18/0x57
[<790391e8>] prepare_to_wait+0x18/0x57
[<7914a320>] # ? generic_make_request+0x80/0xb5
[<790e4f30>] # ? unmap_underlying_metadata+0x2e/0x2e
[<792b7158>] __wait_on_bit+0x20/0x5f
[<7914a427>] # ? submit_bio+0xd2/0xfb
[<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
[<790e4f30>] # ? unmap_underlying_metadata+0x2e/0x2e
[<79039086>] # ? autoremove_wake_function+0x31/0x31
[<790e4eff>] __wait_on_buffer+0x1b/0x1e
[<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
[<7911015f>] ext3_iget+0x45/0x346
[<79114e03>] ext3_lookup+0x97/0xa6
[<790c9c0e>] lookup_real+0x20/0x35
[<790ca1d2>] __lookup_hash+0x2a/0x31
[<790caa0e>] lookup_slow+0x36/0x8c
[<790cbdda>] path_lookupat+0xf9/0x5c5
[<790cc2c5>] filename_lookup+0x1f/0x84
[<790cde98>] user_path_at_empty+0x3f/0x65
[<790cdecb>] user_path_at+0xd/0xf
[<790c6585>] vfs_fstatat+0x40/0x88
[<7903f844>] # ? lg_local_unlock+0x31/0x47
[<790c65f8>] vfs_stat+0x13/0x15
[<790c67e2>] sys_stat64+0x11/0x22
[<790c3c47>] # ? __fput+0x187/0x1a0
[<792b9d37>] # ? restore_all+0xf/0xf
[<79165bb4>] # ? trace_hardirqs_on_thunk+0xc/0x10
[<792b9d04>] syscall_call+0x7/0xb

then the information would still be there, it would still be in the
right place, but it would also be much easier to ignore the right side
column it visually. (The '# ?' prefix would also make it clear that
this information is not so important as the left hand side column.)

Or something like that. Assuming this doesn't confuse tools and such.

Thanks,

Ingo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/