[PATCH v2] kernel/exit: do panic earlier to get coredump if global init task exit

From: qiwuchen55
Date: Sun Dec 15 2019 - 22:19:06 EST


From: chenqiwu <chenqiwu@xxxxxxxxxx>

When global init task get a chance to be killed, panic will happen in
later calling steps by do_exit()->exit_notify()->forget_original_parent()
->find_child_reaper() if all init threads have exited.

However, it's hard to extract the coredump of init task from a kernel
crashdump, since exit_mm() has released its mm before panic. In order
to get the backtrace of init task in userspace, it's better to do panic
earlier at the beginning of exitting route.

It's worth noting that we must take case of a multi-threaded init exitting
issue. We need the test for is_global_init() && group_dead to ensure that
it is all of init threads exiting and not just the current thread.

Signed-off-by: chenqiwu <chenqiwu@xxxxxxxxxx>
---
changes in v2:
- using is_global_init() && group_dead as panic condition.
- move up group_dead = atomic_dec_and_test(&tsk->signal->live).
- add comment for this change in do_exit().
---
kernel/exit.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index bcbd598..33364c8 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -517,10 +517,6 @@ static struct task_struct *find_child_reaper(struct task_struct *father,
}

write_unlock_irq(&tasklist_lock);
- if (unlikely(pid_ns == &init_pid_ns)) {
- panic("Attempted to kill init! exitcode=0x%08x\n",
- father->signal->group_exit_code ?: father->exit_code);
- }

list_for_each_entry_safe(p, n, dead, ptrace_entry) {
list_del_init(&p->ptrace_entry);
@@ -728,6 +724,14 @@ void __noreturn do_exit(long code)
panic("Attempted to kill the idle task!");

/*
+ * If all threads of global init have exited, do panic imeddiately
+ * to get the coredump to find any clue for init task in userspace.
+ */
+ group_dead = atomic_dec_and_test(&tsk->signal->live);
+ if (unlikely(is_global_init(tsk) && group_dead))
+ panic("Attempted to kill init! exitcode=0x%08lx\n", code);
+
+ /*
* If do_exit is called because this processes oopsed, it's possible
* that get_fs() was left as KERNEL_DS, so reset it to USER_DS before
* continuing. Amongst other possible reasons, this is to prevent
@@ -764,7 +768,6 @@ void __noreturn do_exit(long code)
if (tsk->mm)
sync_mm_rss(tsk->mm);
acct_update_integrals(tsk);
- group_dead = atomic_dec_and_test(&tsk->signal->live);
if (group_dead) {
#ifdef CONFIG_POSIX_TIMERS
hrtimer_cancel(&tsk->signal->real_timer);
--
1.9.1