Re: [PATCH V3] exit: trigger panic when global init has exited

From: qianli zhao
Date: Fri Mar 19 2021 - 02:34:36 EST


Hi,Eric

> As I understand it this patch has two purposes:
> 1. Avoid the BUG_ON in zap_pid_ns_processes when !CONFIG_PID_NS
> 2. panic as early as possible so exiting threads don't removing
> interesting debugging state.

Your understanding is very correct,this is what my patch wants to do

> I think if we are going to move the decrement of signal->live that
> should be it's own patch and be accompanied with a good description of
> why it is safe instead of having the decrement of signal->live be there
> as a side effect of another change.

I will think about the risks of movement of the decrement of
signal->live before exit_signal().
If is difficult to judge movement of the decrement of signal->live is
safe,how about only test 'signal->live==1' not use group_dead?

Such as:
diff --git a/kernel/exit.c b/kernel/exit.c
index 04029e3..87f3595 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -767,6 +767,17 @@ void __noreturn do_exit(long code)
validate_creds_for_do_exit(tsk);

/*
+ * If global init has exited,
+ * panic immediately to get a useable coredump.
+ */
+ if (unlikely(is_global_init(tsk) &&
+ ((atomic_read(&tsk->signal->live) == 1) || /*current is
last init thread*/
+ (tsk->signal->flags & SIGNAL_GROUP_EXIT)))) {
+ panic("Attempted to kill init! exitcode=0x%08x\n",
+ tsk->signal->group_exit_code ?: (int)code);
+ }
+
+ /*
* We're taking recursive faults here in do_exit. Safest is to just
* leave this task alone and wait for reboot.
*/
@@ -784,16 +795,9 @@ void __noreturn do_exit(long code)
if (tsk->mm)
sync_mm_rss(tsk->mm);
acct_update_integrals(tsk);
+
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
- /*
- * If the last thread of global init has exited, panic
- * immediately to get a useable coredump.
- */
- if (unlikely(is_global_init(tsk)))
- panic("Attempted to kill init! exitcode=0x%08x\n",
- tsk->signal->group_exit_code ?: (int)code);
-

Eric W. Biederman <ebiederm@xxxxxxxxxxxx> 于2021年3月19日周五 上午3:09写道:
>
> Oleg Nesterov <oleg@xxxxxxxxxx> writes:
>
> > On 03/18, qianli zhao wrote:
> >>
> >> Hi,Oleg
> >>
> >> Thank you for your reply.
> >>
> >> >> When init sub-threads running on different CPUs exit at the same time,
> >> >> zap_pid_ns_processe()->BUG() may be happened.
> >>
> >> > and why do you think your patch can't prevent this?
> >>
> >> > Sorry, I must have missed something. But it seems to me that you are trying
> >> > to fix the wrong problem. Yes, zap_pid_ns_processes() must not be called in
> >> > the root namespace, and this has nothing to do with CONFIG_PID_NS.
> >>
> >> Yes, i try to fix this exception by test SIGNAL_GROUP_EXIT and call
> >> panic before setting PF_EXITING to prevent zap_pid_ns_processes()
> >> being called when init do_exit().
> >
> > Ah, I didn't notice your patch does atomic_dec_and_test(signal->live)
> > before exit_signals() which sets PF_EXITING. Thanks for correcting me.
> >
> > So yes, I was wrong, your patch can prevent this. Although I'd like to
> > recheck if every do-something-if-group-dead action is correct in the
> > case we have a non-PF_EXITING thread...
> >
> > But then I don't understand the SIGNAL_GROUP_EXIT check added by your
> > patch. Do we really need it if we want to avoid zap_pid_ns_processes()
> > when the global init exits?
> >
> >> In addition, the patch also protects the init process state to
> >> successfully get usable init coredump.
> >
> > Could you spell please?
> >
> > Does this connect to SIGNAL_GROUP_EXIT check? Do you mean that you want
> > to panic earlier, before other init's sub-threads exit?
>
> That is my understanding.
>
> As I understand it this patch has two purposes:
> 1. Avoid the BUG_ON in zap_pid_ns_processes when !CONFIG_PID_NS
> 2. panic as early as possible so exiting threads don't removing
> interesting debugging state.
>
>
> It is a bit tricky to tell if the movement of the decrement of
> signal->live is safe. That affects current_is_single threaded
> which is used by unshare, setns of the time namespace, and setting
> the selinux part of creds.
>
> The usage in kernel/cgroup/cgroup.c:css_task_iter_advance seems safe.
> Hmm, Maybe not. Today cgroup_thread_change_begin is held around
> setting PF_EXITING before signal->live is decremented. So there seem to
> be some subtle cgroup dependencies.
>
> The usages of group_dead in do_exit seem safe, as except for the new
> one everything is the same.
>
> We could definitely take advantage of knowing group_dead in exit_signals
> to simplify it's optimization to not rerouting signals to living
> threads.
>
>
> I think if we are going to move the decrement of signal->live that
> should be it's own patch and be accompanied with a good description of
> why it is safe instead of having the decrement of signal->live be there
> as a side effect of another change.
>
> Eric