Re: [PATCH 0/5] mm/memcg: Reduce kmemcache memory accounting overhead

From: Waiman Long
Date: Mon Apr 12 2021 - 15:21:04 EST


On 4/12/21 1:47 PM, Roman Gushchin wrote:
On Mon, Apr 12, 2021 at 10:03:13AM -0400, Waiman Long wrote:
On 4/9/21 9:51 PM, Roman Gushchin wrote:
On Fri, Apr 09, 2021 at 07:18:37PM -0400, Waiman Long wrote:
With the recent introduction of the new slab memory controller, we
eliminate the need for having separate kmemcaches for each memory
cgroup and reduce overall kernel memory usage. However, we also add
additional memory accounting overhead to each call of kmem_cache_alloc()
and kmem_cache_free().

For workloads that require a lot of kmemcache allocations and
de-allocations, they may experience performance regression as illustrated
in [1].

With a simple kernel module that performs repeated loop of 100,000,000
kmem_cache_alloc() and kmem_cache_free() of 64-byte object at module
init. The execution time to load the kernel module with and without
memory accounting were:

with accounting = 6.798s
w/o accounting = 1.758s

That is an increase of 5.04s (287%). With this patchset applied, the
execution time became 4.254s. So the memory accounting overhead is now
2.496s which is a 50% reduction.
Hi Waiman!

Thank you for working on it, it's indeed very useful!
A couple of questions:
1) did your config included lockdep or not?
The test kernel is based on a production kernel config and so lockdep isn't
enabled.
2) do you have a (rough) estimation how much each change contributes
to the overall reduction?
I should have a better breakdown of the effect of individual patches. I
rerun the benchmarking module with turbo-boosting disabled to reduce
run-to-run variation. The execution times were:

Before patch: time = 10.800s (with memory accounting), 2.848s (w/o
accounting), overhead = 7.952s
After patch 2: time = 9.140s, overhead = 6.292s
After patch 3: time = 7.641s, overhead = 4.793s
After patch 5: time = 6.801s, overhead = 3.953s
Thank you! If there will be v2, I'd include this information into commit logs.

Yes, I am planning to send out v2 with these information in the cover-letter. I am just waiting a bit to see if there are more feedback.

-Longman


Patches 1 & 4 are preparatory patches that should affect performance.

So the memory accounting overhead was reduced by about half.

BTW, the benchmark that I used is kind of the best case behavior as it as all updates are to the percpu stocks. Real workloads will likely to have a certain amount of update to the memcg charges and vmstats. So the performance benefit will be less.

Cheers,
Longman