Re: [patch 2/2] memcg: do not sleep on OOM waitqueue with full chargecontext

From: David Rientjes
Date: Tue Jun 11 2013 - 17:57:16 EST


On Thu, 6 Jun 2013, Johannes Weiner wrote:

> > Could you point me to those bug reports? As far as I know, we have never
> > encountered them so it would be surprising to me that we're running with a
> > potential landmine and have seemingly never hit it.
>
> Sure thing: https://lkml.org/lkml/2012/11/21/497
>

Ok, I think I read most of it, although the lkml.org interface makes it
easy to miss some.

> During that thread Michal pinned down the problem to i_mutex being
> held by the OOM invoking task, which the selected victim is trying to
> acquire.
>
> > > > > Reported-by: Reported-by: azurIt <azurit@xxxxxxxx>

Ok, so the key here is that azurIt was able to reliably reproduce this
issue and now it has been resurrected after seven months of silence since
that thread. I also notice that azurIt isn't cc'd on this thread. Do we
know if this is still a problem?

We certainly haven't run into any memcg deadlocks like this.

> > It certainly would, but it's not the point that memory.oom_delay_millisecs
> > was intended to address. memory.oom_delay_millisecs would simply delay
> > calling mem_cgroup_out_of_memory() unless userspace can't free memory or
> > increase the memory limit in time. Obviously that delay isn't going to
> > magically address any lock dependency issues.
>
> The delayed fallback would certainly resolve the issue of the
> userspace handler getting stuck, be it due to memory shortness or due
> to locks.
>
> However, it would not solve the part of the problem where the OOM
> killing kernel task is holding locks that the victim requires to exit.
>

Right.

> We are definitely looking at multiple related issues, that's why I'm
> trying to fix them step by step.
>

I guess my question is why this would be addressed now when nobody has
reported it recently on any recent kernel and then not cc the person who
reported it?

Can anybody, even with an instrumented kernel to make it more probable,
reproduce the issue this is addressing?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/