Re: [PATCH V3] exit: trigger panic when global init has exited

From: qianli zhao
Date: Mon Mar 22 2021 - 23:15:17 EST


Hi,Oleg

> No, there is at least one alive init thread. If they all have exited, we have
> the thread which calls panic() above.

By current logic, setting PF_EXITING(exit_signals()) is before the
panic(),find_alive_thread() determines the PF_EXITING of all child
threads, the panic thread's PF_EXITING has been set before panic(),so
find_alive_thread() thinks this thread also dead, resulting in
find_alive_thread returning NULL.It is possible to trigger a
zap_pid_ns_processes()->BUG() in this case.
========
exit_signals(tsk); /* sets PF_EXITING */
...
group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
if (unlikely(is_global_init(tsk)))
panic("Attempted to kill init!
exitcode=0x%08x\n",-------------------->//PF_EXITING has been set
tsk->signal->group_exit_code ?: (int)code);

=======

> Why do you think so? It can affect _any_ code which runs under
> "if (group_dead)". Again, I don't see anything wrong, but I didn't even
> try to audit these code paths.

Yes,all places where checked the "signal->live" may be affected,but
even before my changes, each program that checks "signal->live" may
get different state(group_dead or not), depending on the timing of the
caller,this situation will not change after my change.
After my patch,"signal->live--" and other variable are set in a
different order(such as signal->live and PF_EXITING),this can cause
abnormalities in the logic associated with these two variables,that is
my thinking.
Of course, check all the "signal->live--" path is definitely
necessary,it's just the case above that we need more attention.

Thanks

Oleg Nesterov <oleg@xxxxxxxxxx> 于2021年3月23日周二 上午12:37写道:
>
> Hi,
>
> It seems that we don't understand each other.
>
> If we move atomic_dec_and_test(signal->live) and do
>
> if (group_dead && is_global_init)
> panic(...);
>
>
> before setting PF_EXITING like your patch does, then zap_pid_ns_processes()
> simply won't be called.
>
> Because:
>
> On 03/21, qianli zhao wrote:
> >
> > Hi,Oleg
> >
> > > How? Perhaps I missed something again, but I don't think this is possible.
> >
> > > zap_pid_ns_processes() simply won't be called, find_child_reaper() will
> > > see the !PF_EXITING thread which calls panic().
> >
> > > So I think this should be documented somehow, at least in the changelog.
> >
> > This problem occurs when both two init threads enter the do_exit,
> > One of the init thread is syscall sys_exit_group,and set SIGNAL_GROUP_EXIT
> > The other init thread perform ret_to_user()->get_signal() and found
> > SIGNAL_GROUP_EXIT is set,then do_group_exit()->do_exit(),since there
> > are no alive init threads it finally goes to
> > zap_pid_ns_processes()
>
> No, there is at least one alive init thread. If they all have exited, we have
> the thread which calls panic() above.
>
> > and BUG().
>
> so we don't need the SIGNAL_GROUP_EXIT check to avoid this BUG().
>
> What have I missed?
>
> Oleg.
>