Re: [PATCH 3/4] OOM, PM: OOM killed task shouldn't escape PM suspend

From: Michal Hocko
Date: Mon Nov 10 2014 - 11:39:07 EST


On Thu 06-11-14 11:28:45, Tejun Heo wrote:
> On Thu, Nov 06, 2014 at 05:02:23PM +0100, Michal Hocko wrote:
[...]
> > > We're doing one thing for non-PM freezing and the other way around for
> > > PM freezing, which indicates one of the two directions is wrong.
> >
> > Because those two paths are quite different in their requirements. The
> > cgroup freezer only cares about freezing tasks and it doesn't have to
> > care about tasks accessing a possibly half suspended device on their way
> > out.
>
> I don't think the fundamental relationship between freezing and oom
> killing are different between the two and the failure to recognize
> that is what's leading to these weird issues.

I do not understand the above. Could you be more specific, please?
AFAIU cgroup freezer requires that no task will leak into userspace
while the cgroup is frozen. This is naturally true for the OOM path
whether the two are synchronized or not.
The PM freezer, on the other hand, requires that no task is _woken up_
after all tasks are frozen. This requires synchronization between the
freezer and OOM path because allocations are allowed also after tasks
are frozen.
What am I missing?

> > > Shouldn't it be that OOM killing happening while PM freezing is in
> > > progress cancels PM freezing rather than the other way around? Find a
> > > point in PM suspend/hibernation operation where everything must be
> > > stable, disable OOM killing there and check whether OOM killing
> > > happened inbetween and if so back out.
> >
> > This is freeze_processes AFAIU. I might be wrong of course but this is
> > the time since when nobody should be waking processes up because they
> > could access half suspended devices.
>
> No, you're doing it before freezing starts. The system is in no way
> in a quiescent state at that point.

You are right! Userspace shouldn't see any unexpected allocation
failures just because PM freezing is in progress. This whole process
should be transparent from userspace POV.

I am getting back to
oom_killer_lock();
error = try_to_freeze_tasks();
if (!error)
oom_killer_disable();
oom_killer_unlock();

Thanks!
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/