Re: [PATCH 0/5] mm/memcg: Reduce kmemcache memory accounting overhead

From: Roman Gushchin
Date: Mon Apr 12 2021 - 13:47:55 EST


On Mon, Apr 12, 2021 at 10:03:13AM -0400, Waiman Long wrote:
> On 4/9/21 9:51 PM, Roman Gushchin wrote:
> > On Fri, Apr 09, 2021 at 07:18:37PM -0400, Waiman Long wrote:
> > > With the recent introduction of the new slab memory controller, we
> > > eliminate the need for having separate kmemcaches for each memory
> > > cgroup and reduce overall kernel memory usage. However, we also add
> > > additional memory accounting overhead to each call of kmem_cache_alloc()
> > > and kmem_cache_free().
> > >
> > > For workloads that require a lot of kmemcache allocations and
> > > de-allocations, they may experience performance regression as illustrated
> > > in [1].
> > >
> > > With a simple kernel module that performs repeated loop of 100,000,000
> > > kmem_cache_alloc() and kmem_cache_free() of 64-byte object at module
> > > init. The execution time to load the kernel module with and without
> > > memory accounting were:
> > >
> > > with accounting = 6.798s
> > > w/o accounting = 1.758s
> > >
> > > That is an increase of 5.04s (287%). With this patchset applied, the
> > > execution time became 4.254s. So the memory accounting overhead is now
> > > 2.496s which is a 50% reduction.
> > Hi Waiman!
> >
> > Thank you for working on it, it's indeed very useful!
> > A couple of questions:
> > 1) did your config included lockdep or not?
> The test kernel is based on a production kernel config and so lockdep isn't
> enabled.
> > 2) do you have a (rough) estimation how much each change contributes
> > to the overall reduction?
>
> I should have a better breakdown of the effect of individual patches. I
> rerun the benchmarking module with turbo-boosting disabled to reduce
> run-to-run variation. The execution times were:
>
> Before patch: time = 10.800s (with memory accounting), 2.848s (w/o
> accounting), overhead = 7.952s
> After patch 2: time = 9.140s, overhead = 6.292s
> After patch 3: time = 7.641s, overhead = 4.793s
> After patch 5: time = 6.801s, overhead = 3.953s

Thank you! If there will be v2, I'd include this information into commit logs.

>
> Patches 1 & 4 are preparatory patches that should affect performance.
>
> So the memory accounting overhead was reduced by about half.

This is really great!

Thanks!