Re: [PATCH RFC exit] Sleep at TASK_IDLE when waiting for application core dump

From: Paul E. McKenney
Date: Thu Jul 25 2024 - 12:24:24 EST


On Thu, Jul 25, 2024 at 03:29:35PM +0200, Oleg Nesterov wrote:
> On 07/24, Paul E. McKenney wrote:
> >
> > Currently, the coredump_task_exit() function sets the task state to
> > TASK_UNINTERRUPTIBLE|TASK_FREEZABLE, which usually works well. But a
> > combination of large memory and slow (and/or highly contended) mass
> > storage can cause application core dumps to take more than two minutes,
> > which can triggers "task blocked" splats.
>
> Do you mean check_hung_uninterruptible_tasks() ?

Yes, from its call to check_hung_task(). Good point, I will add that
to the commit log.

> In any case,
>
> > Therefore, change that TASK_UNINTERRUPTIBLE to TASK_IDLE.
> ...
> > @@ -429,7 +429,7 @@ static void coredump_task_exit(struct task_struct *tsk)
> > complete(&core_state->startup);
> >
> > for (;;) {
> > - set_current_state(TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
> > + set_current_state(TASK_IDLE|TASK_FREEZABLE);
>
> To me this change makes sense regardless...
>
> To some degree TASK_UNINTERRUPTIBLE is misleading, in that the task which
> sleeps in coredump_task_exit() is _KILLABLE, although not "directly".
>
> A SIGKILL sent to the coredumping process will interrupt the coredumping thread
> (see the signal->core_state check in prepare_signal() and fatal_signal_pending()
> check in dump_interrupted()) and then this thread will wakeup the threads sleeping
> in coredump_task_exit(), see coredump_finish().

Very good, I will add this to the commit log with attribution.

> Acked-by: Oleg Nesterov <oleg@xxxxxxxxxx>

Thank you for the thorough review! How does the updated patch shown
below look to you?

Thanx, Paul

------------------------------------------------------------------------

commit a6c7779283d67a409b81616a5b485ac21637d7e7
Author: Paul E. McKenney <paulmck@xxxxxxxxxx>
Date: Wed Jul 24 16:51:52 2024 -0700

exit: Sleep at TASK_IDLE when waiting for application core dump

Currently, the coredump_task_exit() function sets the task state
to TASK_UNINTERRUPTIBLE|TASK_FREEZABLE, which usually works well.
But a combination of large memory and slow (and/or highly contended)
mass storage can cause application core dumps to take more than
two minutes, which can cause check_hung_task(), which is invoked by
check_hung_uninterruptible_tasks(), to produce task-blocked splats.
There does not seem to be any reasonable benefit to getting these splats.

Furthermore, as Oleg Nesterov points out, TASK_UNINTERRUPTIBLE could
be misleading because the task sleeping in coredump_task_exit() really
is killable, albeit indirectly. See the check of signal->core_state
in prepare_signal() and the check of fatal_signal_pending()
in dump_interrupted(), which bypass the normal unkillability of
TASK_UNINTERRUPTIBLE, resulting in coredump_finish() invoking
wake_up_process() on any threads sleeping in coredump_task_exit().

Therefore, change that TASK_UNINTERRUPTIBLE to TASK_IDLE.

Reported-by: Anhad Jai Singh <ffledgling@xxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
Acked-by: Oleg Nesterov <oleg@xxxxxxxxxx>
Cc: Jens Axboe <axboe@xxxxxxxxx>
Cc: Christian Brauner <brauner@xxxxxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Cc: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx>
Cc: Chris Mason <clm@xxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxxx>

diff --git a/kernel/exit.c b/kernel/exit.c
index f95a2c1338a8..b0d18f7b6d15 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -429,7 +429,7 @@ static void coredump_task_exit(struct task_struct *tsk)
complete(&core_state->startup);

for (;;) {
- set_current_state(TASK_UNINTERRUPTIBLE|TASK_FREEZABLE);
+ set_current_state(TASK_IDLE|TASK_FREEZABLE);
if (!self.task) /* see coredump_finish() */
break;
schedule();