Re: [PATCH v2 0/4] kernel.h further split

From: Greg Kroah-Hartman
Date: Thu Oct 07 2021 - 06:34:07 EST


On Thu, Oct 07, 2021 at 12:51:25PM +0300, Andy Shevchenko wrote:
> The kernel.h is a set of something which is not related to each other
> and often used in non-crossed compilation units, especially when drivers
> need only one or two macro definitions from it.
>
> Here is the split of container_of(). The goals are the following:
> - untwist the dependency hell a bit
> - drop kernel.h inclusion where it's only used for container_of()
> - speed up C preprocessing.
>
> People, like Greg KH and Miguel Ojeda, were asking about the latter.
> Read below the methodology and test setup with outcome numbers.
>
> The methodology
> ===============
> The question here is how to measure in the more or less clean way
> the C preprocessing time when building a project like Linux kernel.
> To answer it, let's look around and see what tools do we have that
> may help. Aha, here is ccache tool that seems quite plausible to
> be used. Its core idea is to preprocess C file, count hash (MD4)
> and compare to ones that are in the cache. If found, return the
> object file, avoiding compilation stage.
>
> Taking into account the property of the ccache, configure and use
> it in the below steps:
>
> 1. Configure kernel with allyesconfig
>
> 2. Make it with `make` to be sure that the cache is filled with
> the latest data. I.o.w. warm up the cache.
>
> 3. Run `make -s` (silent mode to reduce the influence of
> the unrelated things, like console output) 10 times and
> measure 'real' time spent.
>
> 4. Repeat 1-3 for each patch or patch set to get data sets before
> and after.
>
> When we get the raw data, calculating median will show us the number.
> Comparing them before and after we will see the difference.
>
> The setup
> =========
> I have used the Intel x86_64 server platform (see partial output of
> `lscpu` below):
>
> $ lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Address sizes: 46 bits physical, 48 bits virtual
> Byte Order: Little Endian
> CPU(s): 88
> On-line CPU(s) list: 0-87
> Vendor ID: GenuineIntel
> Model name: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> CPU family: 6
> Model: 79
> Thread(s) per core: 2
> Core(s) per socket: 22
> Socket(s): 2
> Stepping: 1
> CPU max MHz: 3600.0000
> CPU min MHz: 1200.0000
> ...
> Caches (sum of all):
> L1d: 1.4 MiB (44 instances)
> L1i: 1.4 MiB (44 instances)
> L2: 11 MiB (44 instances)
> L3: 110 MiB (2 instances)
> NUMA:
> NUMA node(s): 2
> NUMA node0 CPU(s): 0-21,44-65
> NUMA node1 CPU(s): 22-43,66-87
> Vulnerabilities:
> Itlb multihit: KVM: Mitigation: Split huge pages
> L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
> Mds: Mitigation; Clear CPU buffers; SMT vulnerable
> Meltdown: Mitigation; PTI
> Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
> Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
> Spectre v2: Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
> Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
>
> With the following GCC:
>
> $ gcc --version
> gcc (Debian 10.3.0-11) 10.3.0
>
> The commands I have run during the measurement were:
>
> rm -rf $O
> make O=$O allyesconfig
> time make O=$O -s -j64 # this step has been measured
>
> The raw data and median
> =======================
> Before patch 2 (yes, I have measured the only patch 2 effect) in the series
> (the data is sorted by time):
>
> real 2m8.794s
> real 2m11.183s
> real 2m11.235s
> real 2m11.639s
> real 2m11.960s
> real 2m12.014s
> real 2m12.609s
> real 2m13.177s
> real 2m13.462s
> real 2m19.132s
>
> After patch 2 has been applied:
>
> real 2m8.536s
> real 2m8.776s
> real 2m9.071s
> real 2m9.459s
> real 2m9.531s
> real 2m9.610s
> real 2m10.356s
> real 2m10.430s
> real 2m11.117s
> real 2m11.885s
>
> Median values are:
> 131.987s before
> 129.571s after
>
> We see the steady speedup as of 1.83%.

You do know about kcbench:
https://gitlab.com/knurd42/kcbench.git

Try running that to make it such that we know how it was tested :)

thanks,

greg k-h


>
> Andy Shevchenko (4):
> kernel.h: Drop unneeded <linux/kernel.h> inclusion from other headers
> kernel.h: Split out container_of() and typeof_member() macros
> lib/rhashtable: Replace kernel.h with the necessary inclusions
> kunit: Replace kernel.h with the necessary inclusions
>
> include/kunit/test.h | 14 ++++++++++++--
> include/linux/container_of.h | 37 ++++++++++++++++++++++++++++++++++++
> include/linux/kernel.h | 31 +-----------------------------
> include/linux/kobject.h | 1 +
> include/linux/list.h | 6 ++++--
> include/linux/llist.h | 4 +++-
> include/linux/plist.h | 5 ++++-
> include/linux/rwsem.h | 1 -
> include/linux/spinlock.h | 1 -
> include/media/media-entity.h | 3 ++-
> lib/radix-tree.c | 6 +++++-
> lib/rhashtable.c | 7 ++++++-
> 12 files changed, 75 insertions(+), 41 deletions(-)
> create mode 100644 include/linux/container_of.h
>
> --
> 2.33.0
>