Re: [PATCH 0/2] Fix memcg/memory.high in case kmem accounting is enabled

From: Tejun Heo
Date: Mon Aug 31 2015 - 13:03:17 EST


Hello,

On Mon, Aug 31, 2015 at 07:51:32PM +0300, Vladimir Davydov wrote:
...
> If we want to allow slab/slub implementation to invoke try_charge
> wherever it wants, we need to introduce an asynchronous thread doing
> reclaim when a memcg is approaching its limit (or teach kswapd do that).

In the long term, I think this is the way to go.

> That's a way to go, but what's the point to complicate things
> prematurely while it seems we can fix the problem by using the technique
> similar to the one behind memory.high?

Cuz we're now scattering workarounds to multiple places and I'm sure
we'll add more try_charge() users (e.g. we want to fold in tcp memcg
under the same knobs) and we'll have to worry about the same problem
all over again and will inevitably miss some cases leading to subtle
failures.

> Nevertheless, even if we introduced such a thread, it'd be just insane
> to allow slab/slub blindly insert try_charge. Let me repeat the examples
> of SLAB/SLUB sub-optimal behavior caused by thoughtless usage of
> try_charge I gave above:
>
> - memcg knows nothing about NUMA nodes, so what's the point in failing
> !__GFP_WAIT allocations used by SLAB while inspecting NUMA nodes?
> - memcg knows nothing about high order pages, so what's the point in
> failing !__GFP_WAIT allocations used by SLUB to try to allocate a
> high order page?

Both are optimistic speculative actions and as long as memcg can
guarantee that those requests will succeed under normal circumstances,
as does the system-wide mm does, it isn't a problem.

In general, we want to make sure inside-cgroup behaviors as close to
system-wide behaviors as possible, scoped but equivalent in kind.
Doing things differently, while inevitable in certain cases, is likely
to get messy in the long term.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/