Re: [] Kernel coredump to a pipe is failing

From: Paul Smith
Date: Wed May 27 2009 - 16:24:17 EST

On Wed, 2009-05-27 at 22:04 +0200, Oleg Nesterov wrote:
> Forgot to mention, and we have problems with OOM. Not only the coredumping
> task can't be killed (and it can populate the memory via get_user_pages).
> The coredump just disables OOM, if select_bad_process() sees the PF_EXITING
> task with ->mm == NULL it returns -1.
> > This all needs more discussion, but imho for now something like
> > Paul's patch
> > is the best workaround. Note that we have the same dump_write()
> > in binfmt_elf.c and binfmt_aout.c, perhaps it makes sense to
> > create coredump_file_write() helper in fs/exec.c.
> But I didn't notice Paul also reports the kernel panic:
> page:ffffe20010d63d00 flags:0x8000000000000001 mapping:0000000000000000 mapcount:0 \
> count:0 Trying to fix it up, but a reboot is needed
> Backtrace:
> Pid: 3346, comm: worker Tainted: P #4
> Call Trace:
> [<ffffffff80284fd4>] bad_page+0x74/0xc0
> [<ffffffff80286168>] free_hot_cold_page+0x248/0x2f0
> [<ffffffff802f4096>] free_wr_note_data+0x56/0x70
> [<ffffffff802a95c6>] kfree+0x86/0x100
> [<ffffffff802f4096>] free_wr_note_data+0x56/0x70
> [<ffffffff802f0991>] elf_core_dump+0x611/0x1160
> At first glance, this looks like a bug outside of coredump.c,
> we are trying to free PG_locked page?

This might be something different, or a side-effect that's not
understood; I haven't seen this happen again since I applied my change,
and I used to be able to make it happen every time within 2 or 3
invocations of my "failing" core dump procedure. Now I have dumped core
using my "failing" procedure 10-15 times in a row with no ill-effects.

I'll keep an eye out for this one though.

