Re: [PATCH v4 2/2] arm64: Expand the stack trace feature to support IRQ stack
From: Jungseok Lee
Date: Mon Oct 12 2015 - 10:53:16 EST
On Oct 9, 2015, at 11:24 PM, James Morse wrote:
> Hi Jungseok,
Hi James,
> On 07/10/15 16:28, Jungseok Lee wrote:
>> Currently, a call trace drops a process stack walk when a separate IRQ
>> stack is used. It makes a call trace information much less useful when
>> a system gets paniked in interrupt context.
>
> panicked
I will fix the typo.
>> This patch addresses the issue with the following schemes:
>>
>> - Store aborted stack frame data
>> - Decide whether another stack walk is needed or not via current sp
>> - Loosen the frame pointer upper bound condition
>
> It may be worth merging this patch with its predecessor - anyone trying to
> bisect a problem could land between these two patches, and spend time
> debugging the truncated call traces.
It was an original intention to lead them to this patch, not the [1/2] one.
This separation would help anyone touching the call trace feature including
me focus on these changes apart from stack allocation, IRQ recursion check
and thread_info management.
In addition, I would like to add a clear and sufficient explanation on the
frame pointer condition.
>> diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
>> index 6ea82e8..e5904a1 100644
>> --- a/arch/arm64/include/asm/irq.h
>> +++ b/arch/arm64/include/asm/irq.h
>> @@ -2,13 +2,25 @@
>> #define __ASM_IRQ_H
>>
>> #include <linux/irqchip/arm-gic-acpi.h>
>> +#include <asm/stacktrace.h>
>>
>> #include <asm-generic/irq.h>
>>
>> struct irq_stack {
>> void *stack;
>> + struct stackframe frame;
>> };
>>
>> +DECLARE_PER_CPU(struct irq_stack, irq_stacks);
>
> Good idea, storing this in the per-cpu data makes it immune to stack
> corruption.
>
>
>> diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
>> index 407991b..5124649 100644
>> --- a/arch/arm64/kernel/stacktrace.c
>> +++ b/arch/arm64/kernel/stacktrace.c
>> @@ -43,7 +43,27 @@ int notrace unwind_frame(struct stackframe *frame)
>> low = frame->sp;
>> high = ALIGN(low, THREAD_SIZE);
>>
>> - if (fp < low || fp > high - 0x18 || fp & 0xf)
>> + /*
>> + * A frame pointer would reach an upper bound if a prologue of the
>> + * first function of call trace looks as follows:
>> + *
>> + * stp x29, x30, [sp,#-16]!
>> + * mov x29, sp
>> + *
>> + * Thus, the upper bound is (top of stack - 0x20) with consideration
>
> The terms 'top' and 'bottom' of the stack are confusing, your 'top' appears
> to be the highest address, which is used first, making it the bottom of the
> stack.
>
> I would try to use the terms low/est and high/est, in keeping with the
> variable names in use here.
Good idea. I'm favor of those terms.
>> + * of a 16-byte empty space in THREAD_START_SP.
>> + *
>> + * The value, 0x20, however, does not cover all cases as interrupts
>> + * are handled using a separate stack. That is, a call trace can start
>> + * from elx_irq exception vectors. The symbols could not be promoted
>> + * to candidates for a stack trace under the restriction, 0x20.
>> + *
>> + * The scenario is handled without complexity as 1) considering
>> + * (bottom of stack + THREAD_START_SP) as a dummy frame pointer, the
>> + * content of which is 0, and 2) allowing the case, which changes
>> + * the value to 0x10 from 0x20.
>
> Where has 0x20 come from? The old value was 0x18.
What I meant is 0x20 is the highest valid frame pointer. The comment should
have been described more clearly.
> My understanding is the highest part of the stack looks like this:
> high [ off-stack ]
> high - 0x08 [ left free by THREAD_START_SP ]
> high - 0x10 [ left free by THREAD_START_SP ]
> high - 0x18 [#1 x30 ]
> high - 0x20 [#1 x29 ]
Clear description than mine!
> So the condition 'fp > high - 0x18' prevents returning either 'left free'
> address, or off-stack-value as a frame. Changing it to 'fp > high - 0x10'
> allows the first half of that reserved area to be a valid stack frame.
I believe my understanding is aligned with yours.
Under a current condition, 'fp > high - 0x18', it is impossible to catch the
'el1_irq' symbol. This is why I set x29 to high - 0x10 and changed the frame
pointer condition, but the changes fail to cover perf according to your data.
> This change is breaking perf using incantations [0] and [1]:
I'm reviewing how perf stack trace works..
> Before, with just patch 1/2:
> ---__do_softirq
> |
> |--92.95%-- __handle_domain_irq
> | __irqentry_text_start
> | el1_irq
> |
>
> After, with both patches:
> ---__do_softirq
> |
> |--83.83%-- __handle_domain_irq
> | __irqentry_text_start
> | el1_irq
> | |
> | |--99.39%-- 0x400008040d00000c
> | --0.61%-- [...]
> |
>
> Changing the condition to 'fp >= high - 0x10' fixes this.
'fp >= high - 0x10' drops 'el1_irq' when dump_stack() or panic() is called.
> I agree it needs documenting, it is quite fiddly - I think Akashi Takahiro
> is the expert.
If possible, it would be greatly helpful.
> I think unwind_frame() needs to walk the irq stack too. [2] is an example
> of perf tracing back to userspace, (and there are patches on the list to
> do/fix this), so we need to walk back to the start of the first stack for
> the perf accounting to be correct.
Frankly, I missed the case where perf does backtrace to userspace.
IMO, this statement supports why the stack trace feature commit should be
written independently. The [1/2] patch would be pretty stable if 64KB page
is supported. The separation might help us concentrate on the stack trace
feature in a generic dump stack, perf, and ftrace point of view.
>> + */
>> + if (fp < low || fp > high - 0x10 || fp & 0xf)
>> return -EINVAL;
>>
>> frame->sp = fp + 0x10;
>> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>> index f93aae5..44b2f828 100644
>> --- a/arch/arm64/kernel/traps.c
>> +++ b/arch/arm64/kernel/traps.c
>> @@ -146,6 +146,8 @@ static void dump_instr(const char *lvl, struct pt_regs *regs)
>> static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
>> {
>> struct stackframe frame;
>> + unsigned int cpu = smp_processor_id();
>
> I wonder if there is any case where dump_backtrace() is called on another cpu?
>
> Setting the cpu value from task_thread_info(tsk)->cpu would protect against
> this.
IMO, no, but your suggestion makes sense. I will update it.
>> + bool in_irq = in_irq_stack(cpu);
>>
>> pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
>>
>> @@ -170,6 +172,10 @@ static void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
>> }
>>
>> pr_emerg("Call trace:\n");
>> +repeat:
>> + if (in_irq)
>> + pr_emerg("<IRQ>\n");
>
> Do we need these? 'el1_irq()' in the trace is a giveaway…
I borrow this idea from x86 implementation in order to show a separate stack
explicitly. There is no issue to remove these tags, <IRQ> and <EOI>.
Great thanks!
Best Regards
Jungseok Lee--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/