Re: [PATCH] kernel/exit: do panic earlier to get coredump if global init task exit
From: Christian Brauner
Date: Sat Dec 14 2019 - 06:57:07 EST
On Sat, Dec 14, 2019 at 02:27:04PM +0800, chenqiwu wrote:
> On Thu, Dec 12, 2019 at 12:05:14PM +0100, Christian Brauner wrote:
> > On Thu, Dec 12, 2019 at 11:08:38AM +0100, Oleg Nesterov wrote:
> > > can't you use is_global_init() && group_dead ?
> >
> > Seems reasonable.
> > Looks like we can move
> > group_dead = atomic_dec_and_test(&tsk->signal->live);
> > further up...
> >
> > (Ideally I'd like to have a test for this to ensure that this lets you
> > capture a global init coredump but that might be tricky. But since you've
> > seem to have run into this case maybe you even have something that could
> > be turned into a test? (Similar to how we already have a purely opt-in
> > test for pstore.))
> >
> > Christian
>
> Hi all,
> I agree that using is_global_init() && group_dead is more reasonable.
>
> The crash isuee happened on a Android phone by reboot stress test.
> panic log:
> [ 84.048521] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
> [ 84.048521]
> [ 84.048540] CPU: 2 PID: 1 Comm: init Tainted: G S O 4.14.117-perf-g8035d1a #1
> [ 84.048544] Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150 RAPHAEL (DT)
> [ 84.048550] Call trace:
> [ 84.048564] dump_backtrace+0x0/0x268
> [ 84.048569] show_stack+0x14/0x20
> [ 84.048577] dump_stack+0xc4/0x100
> [ 84.048584] panic+0x1f0/0x410
> [ 84.048591] complete_and_exit+0x0/0x20
> [ 84.048596] do_group_exit+0x8c/0xa0
> [ 84.048602] get_signal+0x1c0/0x790
> [ 84.048608] do_notify_resume+0x184/0xc30
> [ 84.048613] work_pending+0x8/0x10
>
> From the kdump loaded by crash utility, all threads of global init have exited,
> the group_dead value of global init has truned to 0 by atomic_dec_and_test().
> crash> ps init
> PID PPID CPU TASK ST %MEM VSZ RSS COMM
> > 1 0 2 ffffffcd77526000 ?? 0.0 0 0 init
> 534 1 4 ffffffcd6b9a9000 ZO 0.0 0 0 init
> 535 1 4 ffffffcd6b9aa000 ZO 0.0 0 0 init
> crash> ps -g 1
> PID: 1 TASK: ffffffcd77526000 CPU: 2 COMMAND: "init"
> (no threads)
> crash> struct task_struct.signal ffffffcd77526000
> signal = 0xffffffcd77530000
> crash> struct signal_struct 0xffffffcd77530000
> struct signal_struct {
> sigcnt = {
> counter = 1
> },
> live = {
> counter = 0
> },
> nr_threads = 1,
> thread_head = {
> next = 0xffffffcd77526730,
> prev = 0xffffffcd77526730
> },
> group_exit_code = 11,
> notify_count = 0,
> group_exit_task = 0x0,
> group_stop_count = 0,
> flags = 4,
> ...
> }
>
> However, as Christian said, the test for this is tricky since we must
> make sure all of init threads exited. I make a test for is_global_init()
> and send a SIGSEGV signal to global init task in userspace. The phone
> crash imeddiately and reboot to collect kdump. Then I extract the coredump
> of global init task from kdump successfully.
> (gdb) core-file core.1.init
> Core was generated by `/system/bin/init second_stage'.
> #0 _exit () at bionic/libc/arch-arm64/syscalls/_exit.S:9
> 9 cmn x0, #(MAX_ERRNO + 1)
> (gdb) bt
> #0 _exit () at bionic/libc/arch-arm64/syscalls/_exit.S:9
> #1 0x00000055606db11c in android::init::InstallRebootSignalHandlers()::$_14::operator()(int) const (this=<optimized out>, signal=11)
> at system/core/init/reboot_utils.cpp:141
> #2 0x00000055606db100 in android::init::InstallRebootSignalHandlers()::$_14::__invoke(int) (signal=11) at system/core/init/reboot_utils.cpp:138
> #3 0x0000007f8de236a0 in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>
> So from the following test, we have confidence that the following patch can help us
Thanks. Can you please resend this as a proper patch for v2?
Christian