RE:[PATCH] ARM64: fixed dump_backtrace() when task running on another cpu

From: Wang Qing
Date: Mon Apr 13 2020 - 05:24:10 EST


>Hi,
>
>On Thu, Apr 09, 2020 at 05:38:16PM +0800, Wang Qing wrote:
>> We cannot get FP and PC when the task is running on another CPU,
>> task->thread.cpu_context is the last time the task was switched out,
>> it's better to give a reminder than to provide wrong information.
>>
>> Signed-off-by: Wang Qing <wangqing@xxxxxxxx>
>
>Are you seeing this happen anywhere in particular today?

This problem is not so obvious, because it will not cause any exceptions
but will show "old" stack always ending with switch_to, I finally confirmed
the problem through debugging.

For example:Task blocked in spinlock/interrupt/busy loop, I want to print
the backtrace when detected(like PSI in Android), the printing is wrong(old).

>
>> ---
>> arch/arm64/kernel/traps.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
>> index cf402be..c04e3e8 100644
>> --- a/arch/arm64/kernel/traps.c
>> +++ b/arch/arm64/kernel/traps.c
>> @@ -106,6 +106,14 @@ void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk)
>> start_backtrace(&frame,
>> (unsigned long)__builtin_frame_address(0),
>> (unsigned long)dump_backtrace);
>> + } else if (tsk->on_cpu) {
>> + /*
>> + * The task is running in another cpu, so the call stack
>> + * is changing and we cannot get it.
>> + */
>> + pr_warn("tsk: %s is running in CPU%d, Don't call trace!\n",
>> + tsk->comm, tsk->cpu);
>
>I believe that we can race with a concurrent write to tsk->cpu in both
>cases above. We could use READ_ONCE() to get a snapshot, but we can
>still race and miss cases where the task was runnning as we backtrace
>it.
>
>Thanks,
>Mark.

I will use task_cpu(tsk) instead of tsk->cpu, and add task_running_oncpu() in
include/linux/sched.h instead of tsk->on_cpu, but as you said, by this patch,
we can still race and miss cases where the task was runnning as we backtrace.

But from the user's perspective, printing wrong backtrace is confused when
we call this function while task already running. However, it's reasonable to
print the last backtrace when task enter running during the function is called.

Thanks,
Wang Qing.

>
>> + return;
>> } else {
>> /*
>> * task blocked in __switch_to
>> --
>> 2.7.4
>>