Re: [PATCH FIX v0] mm: memcg/slab: Uncharge during kmem_cache_free_bulk()
From: Roman Gushchin
Date: Mon Oct 12 2020 - 22:42:02 EST
On Mon, Oct 12, 2020 at 09:03:26AM +0530, Bharata B Rao wrote:
> On Fri, Oct 09, 2020 at 11:45:51AM -0700, Roman Gushchin wrote:
> > On Fri, Oct 09, 2020 at 11:34:23AM +0530, Bharata B Rao wrote:
> >
> > Hi Bharata,
> >
> > > Object cgroup charging is done for all the objects during
> > > allocation, but during freeing, uncharging ends up happening
> > > for only one object in the case of bulk allocation/freeing.
> >
> > Yes, it's definitely a problem. Thank you for catching it!
> >
> > I'm curious, did you find it in the wild or by looking into the code?
>
> Found by looking at the code.
>
> >
> > >
> > > Fix this by having a separate call to uncharge all the
> > > objects from kmem_cache_free_bulk() and by modifying
> > > memcg_slab_free_hook() to take care of bulk uncharging.
> > >
> > > Signed-off-by: Bharata B Rao <bharata@xxxxxxxxxxxxx>
> >
> > Please, add:
> >
> > Fixes: 964d4bd370d5 ("mm: memcg/slab: save obj_cgroup for non-root slab objects")
> > Cc: stable@xxxxxxxxxxxxxxx
> >
> > > ---
> > > mm/slab.c | 2 +-
> > > mm/slab.h | 42 +++++++++++++++++++++++++++---------------
> > > mm/slub.c | 3 ++-
> > > 3 files changed, 30 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/mm/slab.c b/mm/slab.c
> > > index f658e86ec8cee..5c70600d8b1cc 100644
> > > --- a/mm/slab.c
> > > +++ b/mm/slab.c
> > > @@ -3440,7 +3440,7 @@ void ___cache_free(struct kmem_cache *cachep, void *objp,
> > > memset(objp, 0, cachep->object_size);
> > > kmemleak_free_recursive(objp, cachep->flags);
> > > objp = cache_free_debugcheck(cachep, objp, caller);
> > > - memcg_slab_free_hook(cachep, virt_to_head_page(objp), objp);
> > > + memcg_slab_free_hook(cachep, &objp, 1);
> > >
> > > /*
> > > * Skip calling cache_free_alien() when the platform is not numa.
> > > diff --git a/mm/slab.h b/mm/slab.h
> > > index 6cc323f1313af..6dd4b702888a7 100644
> > > --- a/mm/slab.h
> > > +++ b/mm/slab.h
> > > @@ -345,30 +345,42 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> > > obj_cgroup_put(objcg);
> > > }
> > >
> > > -static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page,
> > > - void *p)
> > > +static inline void memcg_slab_free_hook(struct kmem_cache *s_orig,
> > > + void **p, int objects)
> > > {
> > > + struct kmem_cache *s;
> > > struct obj_cgroup *objcg;
> > > + struct page *page;
> > > unsigned int off;
> > > + int i;
> > >
> > > if (!memcg_kmem_enabled())
> > > return;
> > >
> > > - if (!page_has_obj_cgroups(page))
> > > - return;
> > > + for (i = 0; i < objects; i++) {
> > > + if (unlikely(!p[i]))
> > > + continue;
> > >
> > > - off = obj_to_index(s, page, p);
> > > - objcg = page_obj_cgroups(page)[off];
> > > - page_obj_cgroups(page)[off] = NULL;
> > > + page = virt_to_head_page(p[i]);
> > > + if (!page_has_obj_cgroups(page))
> > > + continue;
> > >
> > > - if (!objcg)
> > > - return;
> > > + if (!s_orig)
> > > + s = page->slab_cache;
> > > + else
> > > + s = s_orig;
> > >
> > > - obj_cgroup_uncharge(objcg, obj_full_size(s));
> > > - mod_objcg_state(objcg, page_pgdat(page), cache_vmstat_idx(s),
> > > - -obj_full_size(s));
> > > + off = obj_to_index(s, page, p[i]);
> > > + objcg = page_obj_cgroups(page)[off];
> > > + if (!objcg)
> > > + continue;
> > >
> > > - obj_cgroup_put(objcg);
> > > + page_obj_cgroups(page)[off] = NULL;
> > > + obj_cgroup_uncharge(objcg, obj_full_size(s));
> > > + mod_objcg_state(objcg, page_pgdat(page), cache_vmstat_idx(s),
> > > + -obj_full_size(s));
> > > + obj_cgroup_put(objcg);
> > > + }
> > > }
> > >
> > > #else /* CONFIG_MEMCG_KMEM */
> > > @@ -406,8 +418,8 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> > > {
> > > }
> > >
> > > -static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page,
> > > - void *p)
> > > +static inline void memcg_slab_free_hook(struct kmem_cache *s,
> > > + void **p, int objects)
> > > {
> > > }
> > > #endif /* CONFIG_MEMCG_KMEM */
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 6d3574013b2f8..0cbe67f13946e 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -3091,7 +3091,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
> > > struct kmem_cache_cpu *c;
> > > unsigned long tid;
> > >
> > > - memcg_slab_free_hook(s, page, head);
> > > + memcg_slab_free_hook(s, &head, 1);
> >
> > Hm, I wonder if it's better to teach do_slab_free() to handle the (cnt > 1) case correctly?
>
> Possible, but we will have to walk the detached freelist there while
> here it is much easier to just walk the array of objects?
>
> >
> > > redo:
> > > /*
> > > * Determine the currently cpus per cpu slab.
> > > @@ -3253,6 +3253,7 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> > > if (WARN_ON(!size))
> > > return;
> > >
> > > + memcg_slab_free_hook(s, p, size);
> >
> > Then you don't need this change.
> >
> > Otherwise memcg_slab_free_hook() can be called twice for the same object. It's ok from
> > accounting correctness perspective, because the first call will zero out the objcg pointer,
> > but still much better to be avoided.
>
> Yes, for that one object there will be one additional uncharge call,
> but as you note it is handled by the !objcg check.
Got it. Thanks for the explanation!
Acked-by: Roman Gushchin <guro@xxxxxx>
Thanks!