Re: [PATCH -mm 1/3] mm/oom_kill: remove the wrong fatal_signal_pending()

From: Oleg Nesterov
Date: Wed Sep 30 2015 - 09:51:01 EST


On 09/30, Tetsuo Handa wrote:
>
> David Rientjes wrote:
> > On Tue, 29 Sep 2015, Oleg Nesterov wrote:
> >
> > > The fatal_signal_pending() was added to suppress unnecessary "sharing
> > > same memory" message, but it can't 100% help anyway because it can be
> > > false-negative; SIGKILL can be already dequeued.
> > >
> > > And worse, it can be false-positive due to exec or coredump. exec is
> > > mostly fine, but coredump is not. It is possible that the group leader
> > > has the pending SIGKILL because its sub-thread originated the coredump,
> > > in this case we must not skip this process.
> > >
> > > We could probably add the additional ->group_exit_task check but this
> > > pach just removes fatal_signal_pending(), the extra "Kill process" is
> > > unlikely and doesn't really hurt.
>
> This fatal_signal_pending() check is about to be added by me because the OOM
> killer spams the kernel log when the mm struct which the OOM victim is using
> is shared by many threads. ( http://marc.info/?l=linux-mm&m=143256441501204 )

OK, I see, but it is wrong.

But I don't really understand "shared by many threads", I mean "threads" is
confusing word. I guess you mean CLONE_VM processes, otherwise we shouldn't
see the additional spam.

And 1000 CLONE_VM processes + "and the lock dependency prevents all threads
except the OOM victim thread from terminating until they get TIF_MEMDIE flag"
look like a really pathological case...

> > In addition, I'm really debating whether we need the "sharing same memory"
> > line or not. In the past, it has been helpful because there is no other
> > way to determine what the kernel has killed other than to leave an
> > artifact behind in the kernel log. I can imagine that this could easily
> > spam the kernel log, though, accompanied by oom killer messages that are
> > already very verbose. I wouldn't mind if it the printk were removed
> > entirely.
> >
>
> I was waiting for your comment about whether you depend on
> the "sharing same memory" message with KERN_ERR level.
> ( http://marc.info/?l=linux-mm&m=144120389203133 )
>
> If nobody else objects, I think we can remove the "sharing same memory"
> message. ( http://marc.info/?l=linux-mm&m=144119325831959 )

OK, will you agree with v2 which also removes pr_warn?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/