Re: [PATCH] zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING

From: Eric W. Biederman
Date: Thu Jun 13 2024 - 08:42:13 EST


Oleg Nesterov <oleg@xxxxxxxxxx> writes:

> kernel_wait4() doesn't sleep and returns -EINTR if there is no
> eligible child and signal_pending() is true.
>
> That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not
> enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending()
> return false and avoid a busy-wait loop.

I took a look through the code. It used to be that TIF_NOTIFY_SIGNAL
was all about waking up a task so that task_work_run can be used.
io_uring still mostly uses it that way. There is also a use in
kthread_stop that just uses it as a TIF_SIGPENDING without having a
pending signal.

At the point in do_exit where exit_notify and thus zap_pid_ns_processes
is called I can't possibly see a use for TIF_NOTIFY_SIGNAL.
exit_task_work, exit_signals, and io_uring_cancel have all been called.

So TIF_NOTIFY_SIGNAL should be spurious at this point and safe to clear.
Why it remains set is a mystery to me.


If I had infinite time and energy the ideal is to rework the pid
namespace exit logic so that waiting for everything to exit works like
delay_group_leader in wait_task_consider. Simply blocking reaping of
the pid namespace leader until everything in the pid namespace have been
reaped. I think acct_exit_ns is the only piece of code that needs
to be moved to allow that, and acct_exit_ns is purely bookkeeping so
does not affect userspace visible semantics.

This active waiting is weird and non-standard in the kernel and winds up
causeing a problem every couple of years because of that.

>
> Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL")
> Reported-by: Rachel Menge <rachelmenge@xxxxxxxxxxxxxxxxxxx>
> Closes: https://lore.kernel.org/all/1386cd49-36d0-4a5c-85e9-bc42056a5a38@xxxxxxxxxxxxxxxxxxx/
> Signed-off-by: Oleg Nesterov <oleg@xxxxxxxxxx>
> ---
> kernel/pid_namespace.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index dc48fecfa1dc..25f3cf679b35 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
> @@ -218,6 +218,7 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
> */
> do {
> clear_thread_flag(TIF_SIGPENDING);
> + clear_thread_flag(TIF_NOTIFY_SIGNAL);
> rc = kernel_wait4(-1, NULL, __WALL, NULL);
> } while (rc != -ECHILD);