Re: [PATCH 09/12] oom: remove PF_EXITING check completely

From: Oleg Nesterov
Date: Thu Jun 03 2010 - 18:13:22 EST


On 06/03, David Rientjes wrote:
>
> On Thu, 3 Jun 2010, Oleg Nesterov wrote:
>
> > On 06/02, David Rientjes wrote:
> > >
> > > On Thu, 3 Jun 2010, KOSAKI Motohiro wrote:
> > >
> > > > Currently, PF_EXITING check is completely broken. because 1) It only
> > > > care main-thread and ignore sub-threads
> > >
> > > Then check the subthreads.
> > >
>
> Did you want to respond to this?

Please explain what you mean. There were already a lot of discussions
about mt issues, I do not know what you have in mind.

> > > It may ignore SIGKILL, but does not ignore fatal_signal_pending() being
> > > true
> >
> > Wrong.
> >
> > Unless the oom victim is exactly the thread which dumps the core,
> > fatal_signal_pending() won't be true for the dumper. Even if the
> > victim and the dumper are from the same group, this thread group
> > already has SIGNAL_GROUP_EXIT. And if they do not belong to the
> > same group, SIGKILL has even less effect.
> >
>
> I'm guessing at the relevancy here because the changelog is extremely
> poorly worded (if I were Andrew I would have no idea how important this
> patch is based on the description other than the alarmist words of "... is
> completely broken)", but if we're concerned about the coredumper not being
> able to find adequate resources to allocate memory from, we can give it
> access to reserves specifically,

I don't think so. If oom-kill wants to kill the task which dumps the
code, it should stop the coredumping and exit.

> we don't need to go killing additional
> tasks which may have their own coredumpers.

Sorry, can't understand.

> That's an alternative solution as well, but I'm disagreeing with the
> approach here because this enforces absolutely no guarantee that the next
> task to be oom killed will be the coredumper, its much more likely that
> we're just going to kill yet another task for the coredump. That task may
> have a coredumper too. Who knows.

Again, please explain this to me.

> > > Nacked-by: David Rientjes <rientjes@xxxxxxxxxx>
> >
> > Kosaki removes the code which only pretends to work, but it doesn't
> > and leads to problems.
> >
>
> LOL, this code doesn't pretend to work,
> ...
> certain code doesn't do a complete job in certain cases or it can
> introduce a deadlock in situations

OK, agreed. It is not that it never works.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/