Re: [PATCH v6 00/19] The new cgroup slab memory controller

From: Roman Gushchin
Date: Sun Jun 21 2020 - 19:35:48 EST

On Sun, Jun 21, 2020 at 06:57:52PM -0400, Qian Cai wrote:
> On Mon, Jun 08, 2020 at 04:06:35PM -0700, Roman Gushchin wrote:
> > This is v6 of the slab cgroup controller rework.
> >
> > The patchset moves the accounting from the page level to the object
> > level. It allows to share slab pages between memory cgroups.
> > This leads to a significant win in the slab utilization (up to 45%)
> > and the corresponding drop in the total kernel memory footprint.
> > The reduced number of unmovable slab pages should also have a positive
> > effect on the memory fragmentation.
> >
> > The patchset makes the slab accounting code simpler: there is no more
> > need in the complicated dynamic creation and destruction of per-cgroup
> > slab caches, all memory cgroups use a global set of shared slab caches.
> > The lifetime of slab caches is not more connected to the lifetime
> > of memory cgroups.
> >
> > The more precise accounting does require more CPU, however in practice
> > the difference seems to be negligible. We've been using the new slab
> > controller in Facebook production for several months with different
> > workloads and haven't seen any noticeable regressions. What we've seen
> > were memory savings in order of 1 GB per host (it varied heavily depending
> > on the actual workload, size of RAM, number of CPUs, memory pressure, etc).
> >
> > The third version of the patchset added yet another step towards
> > the simplification of the code: sharing of slab caches between
> > accounted and non-accounted allocations. It comes with significant
> > upsides (most noticeable, a complete elimination of dynamic slab caches
> > creation) but not without some regression risks, so this change sits
> > on top of the patchset and is not completely merged in. So in the unlikely
> > event of a noticeable performance regression it can be reverted separately.
> Reverting this series and its dependency [1], i.e.,
> git revert --no-edit 05923a2ccacd..07666ee77fb4
> on the top of next-20200621 fixed an issue where kmemleak could report
> thousands of leaks like this below using this .config (if ever matters),
> unreferenced object 0xffff888ff2bf6200 (size 512):

Hi Qian!

My wild guess is that kmemleak is getting confused by modifying the lowest
bit of page->mem_cgroup/obhj_cgroups pointer:

struct page {
union {
struct mem_cgroup *mem_cgroup;
struct obj_cgroup **obj_cgroups;

We're using the lowest bit to distinguish between a "normal" mem_cgroup
pointer and a vector of obj_cgroup pointers.

This pointer to obj_cgroup vector is saved only here, so if we're modifying
the address, I guess it's what makes kmemleak think that there is a leak.

Or do you have a real leak?