Re: [PATCH 0/2] Fix memcg/memory.high in case kmem accounting is enabled

From: Christoph Lameter
Date: Wed Sep 02 2015 - 14:16:53 EST

On Wed, 2 Sep 2015, Vladimir Davydov wrote:

> Slab is a kind of abnormal alloc_pages user. By calling alloc_pages_node
> with __GFP_THISNODE and w/o __GFP_WAIT before falling back to
> alloc_pages with the caller's context, it does the job normally done by
> alloc_pages itself. It's not what is done massively.
> Leaving slab charge path as is looks really ugly to me. Look, slab
> iterates over all nodes, inspecting if they have free pages and fails
> even if they do due to the memcg constraint...

Well yes it needs to do that due to the way NUMA support was designed in.
SLAB needs to check the per node caches if objects are present before
going to more remote nodes. Sorry about this. I realized the design issue
in 2006 and SLUB was the result in 2007 of an alternate design to let the
page allocator do its proper job.

> To sum it up. Basically, there are two ways of handling kmemcg charges:
> 1. Make the memcg try_charge mimic alloc_pages behavior.
> 2. Make API functions (kmalloc, etc) work in memcg as if they were
> called from the root cgroup, while keeping interactions between the
> low level subsys (slab) and memcg private.
> Way 1 might look appealing at the first glance, but at the same time it
> is much more complex, because alloc_pages has grown over the years to
> handle a lot of subtle situations that may arise on global memory
> pressure, but impossible in memcg. What does way 1 give us then? We
> can't insert try_charge directly to alloc_pages and have to spread its
> calls all over the code anyway, so why is it better? Easier to use it in
> places where users depend on buddy allocator peculiarities? There are
> not many such users.

Would it be possible to have a special alloc_pages_memcg with different

On the other hand alloc_pages() has grown to handle all the special cases.
Why cant it also handle the special memcg case? There are numerous other
allocators that cache memory in the kernel from networking to
the bizarre compressed swap approaches. How does memcg handle that? Isnt
that situation similar to what the slab allocators do?

> exists solely for memcg-vs-list_lru and memcg-vs-slab interactions. We
> even handle kmem_cache destruction on memcg offline differently for SLAB
> and SLUB for performance reasons.

Ugly. Internal allocator design impacts container handling.

> Way 2 gives us more space to maneuver IMO. SLAB/SLUB may do weird tricks
> for optimization, but their API is well defined, so we just make kmalloc
> work as expected while providing inter-subsys calls, like
> memcg_charge_slab, for SLAB/SLUB that have their own conventions. You
> mentioned kmem users that allocate memory using alloc_pages. There is an
> API function for them too, alloc_kmem_pages. Everything behind the API
> is hidden and may be done in such a way to achieve optimal performance.

Can we also hide cgroups memory handling behind the page based schemes
without having extra handling for the slab allocators?

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at