Re: [PATCH] kernel/hung_task.c: Replace trigger_all_cpu_backtrace() with task traversal.

From: Tetsuo Handa
Date: Fri May 03 2019 - 06:44:41 EST


Dmitry, I know you are currently OOO.

For the record, two console outputs from two bug reports showed that syzbot is
dropping hint of the culprit thread which is causing the khungtaskd to fire.

https://syzkaller.appspot.com/text?tag=CrashLog&x=1104bb90a00000
https://syzkaller.appspot.com/text?tag=CrashLog&x=135ff034a00000

On 2019/04/29 20:52, Tetsuo Handa wrote:
> Since trigger_all_cpu_backtrace() uses NMI interface, printk() from other
> CPUs are called from interrupt context. Therefore, CONFIG_PRINTK_CALLER=y
> needlessly separates printk() from khungtaskd kernel thread running on
> current CPU and printk() from other threads running on other CPUs.
>
> Also, it is completely a garbage that trigger_all_cpu_backtrace() reports
> khungtaskd kernel thread running on current CPU, for the purpose of
> calling trigger_all_cpu_backtrace() from khungtaskd is to report running
> threads which might have caused other threads being blocked for so long.
>
> Therefore, report threads (except khungtaskd kernel thread itself) which
> are on the scheduler using task traversal approach. This allows syzbot to
> include backtrace of running threads into its report files.
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> ---
> kernel/hung_task.c | 19 ++++++++++++++++++-
> 1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index f108a95..2fddd98 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -164,6 +164,23 @@ static bool rcu_lock_break(struct task_struct *g, struct task_struct *t)
> return can_cont;
> }
>
> +static void print_all_running_threads(void)
> +{
> +#ifdef CONFIG_SMP
> + struct task_struct *g;
> + struct task_struct *t;
> +
> + rcu_read_lock();
> + for_each_process_thread(g, t) {
> + if (!t->on_cpu || t == current)
> + continue;
> + pr_err("INFO: Currently running\n");
> + sched_show_task(t);
> + }
> + rcu_read_unlock();
> +#endif
> +}
> +
> /*
> * Check whether a TASK_UNINTERRUPTIBLE does not get woken up for
> * a really long time (120 seconds). If that happens, print out
> @@ -201,7 +218,7 @@ static void check_hung_uninterruptible_tasks(unsigned long timeout)
> if (hung_task_show_lock)
> debug_show_all_locks();
> if (hung_task_call_panic) {
> - trigger_all_cpu_backtrace();
> + print_all_running_threads();
> panic("hung_task: blocked tasks");
> }
> }
>