Re: [PATCH 2/7] stacktrace,sched: Make stack_trace_save_tsk() more robust

From: Peter Zijlstra
Date: Mon Oct 25 2021 - 12:35:12 EST


On Fri, Oct 22, 2021 at 05:09:35PM +0200, Peter Zijlstra wrote:
> --- a/kernel/stacktrace.c
> +++ b/kernel/stacktrace.c
> @@ -123,6 +123,13 @@ unsigned int stack_trace_save(unsigned l
> }
> EXPORT_SYMBOL_GPL(stack_trace_save);
>
> +static int try_arch_stack_walk_tsk(struct task_struct *tsk, void *arg)
> +{
> + stack_trace_consume_fn consume_entry = stack_trace_consume_entry_nosched;
> + arch_stack_walk(consume_entry, arg, tsk, NULL);
> + return 0;
> +}
> +
> /**
> * stack_trace_save_tsk - Save a task stack trace into a storage array
> * @task: The task to examine
> @@ -135,7 +142,6 @@ EXPORT_SYMBOL_GPL(stack_trace_save);
> unsigned int stack_trace_save_tsk(struct task_struct *tsk, unsigned long *store,
> unsigned int size, unsigned int skipnr)
> {
> - stack_trace_consume_fn consume_entry = stack_trace_consume_entry_nosched;
> struct stacktrace_cookie c = {
> .store = store,
> .size = size,
> @@ -143,11 +149,8 @@ unsigned int stack_trace_save_tsk(struct
> .skip = skipnr + (current == tsk),
> };
>
> - if (!try_get_task_stack(tsk))
> - return 0;

So I took that out because task_try_func() pins the task, except now
I see that _reliable() has a comment about zombies, which I suppose is
equally applicable to here and wchan.

Alternative to failing try_get_task_stack() is checking PF_EXITING in
try_arch_stack_walk_tsk(), which seems more consistent behaviour since
it doesn't rely on CONFIG_THREAD_INFO_IN_TASK.

> + task_try_func(tsk, try_arch_stack_walk_tsk, &c);
>
> - arch_stack_walk(consume_entry, &c, tsk, NULL);
> - put_task_stack(tsk);
> return c.len;
> }
>
>
>