Re: [PATCH] kernel/hung_task.c: Monitor killed tasks.

From: Petr Mladek
Date: Wed May 15 2019 - 06:57:31 EST


On Mon 2019-05-13 20:02:11, Tetsuo Handa wrote:
> syzbot's second top report is "no output from test machine" where the
> userspace process failed to spawn a new test process for 300 seconds
> for some reason. One of reasons which can result in this report is that
> an already spawned test process was unable to terminate (e.g. trapped at
> an unkillable retry loop due to some bug) after SIGKILL was sent to that
> process. Therefore, reporting when a thread is failing to terminate
> despite a fatal signal is pending would give us more useful information.
>
> This version shares existing sysctl settings (e.g. check interval,
> timeout, whether to panic) used for detecting TASK_UNINTERRUPTIBLE
> threads, for I don't know whether people want to use a new kernel
> config option and different sysctl settings for monitoring killed
> threads.
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
> Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> ---
> include/linux/sched.h | 1 +
> kernel/hung_task.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 45 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index a2cd1585..d42bdd7 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -850,6 +850,7 @@ struct task_struct {
> #ifdef CONFIG_DETECT_HUNG_TASK
> unsigned long last_switch_count;
> unsigned long last_switch_time;
> + unsigned long killed_time;

I would call this fatal_signal_time to make the meaning more clear.

> #endif
> /* Filesystem information: */
> struct fs_struct *fs;
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index f108a95..34e7b84 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -141,6 +141,47 @@ static void check_hung_task(struct task_struct *t, unsigned long timeout)
> touch_nmi_watchdog();
> }
>
> +static void check_killed_task(struct task_struct *t, unsigned long timeout)
> +{
> + unsigned long stamp = t->killed_time;
> +
> + /*
> + * Ensure the task is not frozen.
> + * Also, skip vfork and any other user process that freezer should skip.
> + */
> + if (unlikely(t->flags & (PF_FROZEN | PF_FREEZER_SKIP)))
> + return;
> + /*
> + * Skip threads which are already inside do_exit(), for exit_mm() etc.
> + * might take many seconds.
> + */
> + if (t->flags & PF_EXITING)
> + return;
> + if (!stamp) {
> + stamp = jiffies;
> + if (!stamp)
> + stamp++;
> + t->killed_time = stamp;
> + return;
> + }

I might be too dumb but the above code looks pretty tricky to me.
It would deserve a comment. Or better, I would remove
trick to handle overflow. If it happens, we would just
lose one check period.

Alternative solution would be to set the timestamp in
complete_signal(). Then we would know that the timestamp
is always valid when a fatal signal is pending.


> + if (time_is_after_jiffies(stamp + timeout * HZ))
> + return;
> + trace_sched_process_hang(t);
> + if (sysctl_hung_task_panic) {
> + console_verbose();
> + hung_task_call_panic = true;

IMHO, the delayed task exit is much less fatal than sleeping
in an uninterruptible state.

Anyway, the check is much less reliable. In case of hung_task,
it is enough when the task gets scheduled. In the new check,
the task has to do some amount of work until the signal
gets handled and do_exit() is called.

The panic should either get enabled separately or we should
never panic in this case.

Best Regards,
Petr