Re: [PATCH v3 31/35] lib: add memory allocations report in show_mem()
From: Steven Rostedt
Date: Thu Feb 15 2024 - 18:26:16 EST
On Thu, 15 Feb 2024 18:16:48 -0500
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> On Thu, 15 Feb 2024 18:07:42 -0500
> Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
>
> > text data bss dec hex filename
> > 29161847 18352730 5619716 53134293 32ac3d5 vmlinux.orig
> > 29162286 18382638 5595140 53140064 32ada60 vmlinux.memtag-off (+5771)
> > 29230868 18887662 5275652 53394182 32ebb06 vmlinux.memtag (+259889)
> > 29230746 18887662 5275652 53394060 32eba8c vmlinux.memtag-default-on (+259767) dropped?
> > 29276214 18946374 5177348 53399936 32ed180 vmlinux.memtag-debug (+265643)
>
> If you plan on running this in production, and this increases the size of
> the text by 68k, have you measured the I$ pressure that this may induce?
> That is, what is the full overhead of having this enabled, as it could
> cause more instruction cache misses?
>
> I wonder if there has been measurements of it off. That is, having this
> configured in but default off still increases the text size by 68k. That
> can't be good on the instruction cache.
>
I should have read the cover letter ;-) (someone pointed me to that on IRC):
> Performance overhead:
> To evaluate performance we implemented an in-kernel test executing
> multiple get_free_page/free_page and kmalloc/kfree calls with allocation
> sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
> affinity set to a specific CPU to minimize the noise. Below are results
> from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on
> 56 core Intel Xeon:
These are micro benchmarks, were any larger benchmarks taken? As
microbenchmarks do not always show I$ issues (because the benchmark itself
will warm up the cache). The cache issue could slow down tasks at a bigger
picture, as it can cause more cache misses.
Running other benchmarks under perf and recording the cache misses between
the different configs would be a good picture to show.
>
> kmalloc pgalloc
> (1 baseline) 6.764s 16.902s
> (2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%)
> (3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%)
> (4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%)
> (5 memcg) 13.388s (+97.94%) 48.460s (+186.71%)
>
> Memory overhead:
> Kernel size:
>
> text data bss dec diff
> (1) 26515311 18890222 17018880 62424413
> (2) 26524728 19423818 16740352 62688898 264485
> (3) 26524724 19423818 16740352 62688894 264481
> (4) 26524728 19423818 16740352 62688898 264485
> (5) 26541782 18964374 16957440 62463596 39183
Similar to my builds.
>
> Memory consumption on a 56 core Intel CPU with 125GB of memory:
> Code tags: 192 kB
> PageExts: 262144 kB (256MB)
> SlabExts: 9876 kB (9.6MB)
> PcpuExts: 512 kB (0.5MB)
>
> Total overhead is 0.2% of total memory.
All this, and we are still worried about 4k for useful debugging :-/
-- Steve