Re: [PATCH 1/4] drivers core: Introduce CPU type sysfs interface

From: Fox Chen
Date: Thu Nov 19 2020 - 03:25:50 EST


On Fri, Nov 13, 2020 at 2:15 PM Brice Goglin <brice.goglin@xxxxxxxxx> wrote:
>
>
> Le 12/11/2020 à 11:49, Greg Kroah-Hartman a écrit :
>
> On Thu, Nov 12, 2020 at 10:10:57AM +0100, Brice Goglin wrote:
>
> Le 12/11/2020 à 07:42, Greg Kroah-Hartman a écrit :
>
> On Thu, Nov 12, 2020 at 07:19:48AM +0100, Brice Goglin wrote:
>
> Le 07/10/2020 à 07:15, Greg Kroah-Hartman a écrit :
>
> On Tue, Oct 06, 2020 at 08:14:47PM -0700, Ricardo Neri wrote:
>
> On Tue, Oct 06, 2020 at 09:37:44AM +0200, Greg Kroah-Hartman wrote:
>
> On Mon, Oct 05, 2020 at 05:57:36PM -0700, Ricardo Neri wrote:
>
> On Sat, Oct 03, 2020 at 10:53:45AM +0200, Greg Kroah-Hartman wrote:
>
> On Fri, Oct 02, 2020 at 06:17:42PM -0700, Ricardo Neri wrote:
>
> Hybrid CPU topologies combine CPUs of different microarchitectures in the
> same die. Thus, even though the instruction set is compatible among all
> CPUs, there may still be differences in features (e.g., some CPUs may
> have counters that others CPU do not). There may be applications
> interested in knowing the type of micro-architecture topology of the
> system to make decisions about process affinity.
>
> While the existing sysfs for capacity (/sys/devices/system/cpu/cpuX/
> cpu_capacity) may be used to infer the types of micro-architecture of the
> CPUs in the platform, it may not be entirely accurate. For instance, two
> subsets of CPUs with different types of micro-architecture may have the
> same capacity due to power or thermal constraints.
>
> Create the new directory /sys/devices/system/cpu/types. Under such
> directory, create individual subdirectories for each type of CPU micro-
> architecture. Each subdirectory will have cpulist and cpumap files. This
> makes it convenient for user space to read all the CPUs of the same type
> at once without having to inspect each CPU individually.
>
> Implement a generic interface using weak functions that architectures can
> override to indicate a) support for CPU types, b) the CPU type number, and
> c) a string to identify the CPU vendor and type.
>
> For example, an x86 system with one Intel Core and four Intel Atom CPUs
> would look like this (other architectures have the hooks to use whatever
> directory naming convention below "types" that meets their needs):
>
> user@host:~$: ls /sys/devices/system/cpu/types
> intel_atom_0 intel_core_0
>
> user@host:~$ ls /sys/devices/system/cpu/types/intel_atom_0
> cpulist cpumap
>
> user@host:~$ ls /sys/devices/system/cpu/types/intel_core_0
> cpulist cpumap
>
> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpumap
> 0f
>
> user@host:~$ cat /sys/devices/system/cpu/types/intel_atom_0/cpulist
> 0-3
>
> user@ihost:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpumap
> 10
>
> user@host:~$ cat /sys/devices/system/cpu/types/intel_core_0/cpulist
> 4
>
> Thank you for the quick and detailed Greg!
>
> The output of 'tree' sometimes makes it easier to see here, or:
> grep -R . *
> also works well.
>
> Indeed, this would definitely make it more readable.
>
> On non-hybrid systems, the /sys/devices/system/cpu/types directory is not
> created. Add a hook for this purpose.
>
> Why should these not show up if the system is not "hybrid"?
>
> My thinking was that on a non-hybrid system, it does not make sense to
> create this interface, as all the CPUs will be of the same type.
>
> Why not just have this an attribute type in the existing cpuX directory?
> Why do this have to be a totally separate directory and userspace has to
> figure out to look in two different spots for the same cpu to determine
> what it is?
>
> But if the type is located under cpuX, usespace would need to traverse
> all the CPUs and create its own cpu masks. Under the types directory it
> would only need to look once for each type of CPU, IMHO.
>
> What does a "mask" do? What does userspace care about this? You would
> have to create it by traversing the directories you are creating anyway,
> so it's not much different, right?
>
> Hello
>
> Sorry for the late reply. As the first userspace consumer of this
> interface [1], I can confirm that reading a single file to get the mask
> would be better, at least for performance reason. On large platforms, we
> already have to read thousands of sysfs files to get CPU topology and
> cache information, I'd be happy not to read one more file per cpu.
>
> Reading these sysfs files is slow, and it does not scale well when
> multiple processes read them in parallel.
>
> Really? Where is the slowdown? Would something like readfile() work
> better for you for that?
> https://lore.kernel.org/linux-api/20200704140250.423345-1-gregkh@xxxxxxxxxxxxxxxxxxx/
>
> I guess readfile would improve the sequential case by avoiding syscalls
> but it would not improve the parallel case since syscalls shouldn't have
> any parallel issue?
>
> syscalls should not have parallel issues at all.
>
> We've been watching the status of readfile() since it was posted on LKML
> 6 months ago, but we were actually wondering if it would end up being
> included at some point.
>
> It needs a solid reason to be merged. My "test" benchmarks are fun to
> run, but I have yet to find a real need for it anywhere as the
> open/read/close syscall overhead seems to be lost in the noise on any
> real application workload that I can find.
>
> If you have a real need, and it reduces overhead and cpu usage, I'm more
> than willing to update the patchset and resubmit it.
>
>
> Good, I'll give it at try.
>
>
> How does multiple processes slow anything down, there shouldn't be any
> shared locks here.
>
> When I benchmarked this in 2016, reading a single (small) sysfs file was
> 41x slower when running 64 processes simultaneously on a 64-core Knights
> Landing than reading from a single process. On a SGI Altix UV with 12x
> 8-core CPUs, reading from one process per CPU (12 total) was 60x slower
> (which could mean NUMA affinity matters), and reading from one process
> per core (96 total) was 491x slower.
>
> I will try to find some time to dig further on recent kernels with perf
> and readfile (both machines were running RHEL7).
>
> 2016 was a long time ago in kernel-land, please retest on a kernel.org
> release, not a RHEL monstrosity.
>
>
> Quick test on 5.8.14 from Debian (fairly close to mainline) on a server with 2x20 cores.
>
> I am measuring the time to do open+read+close of /sys/devices/system/cpu/cpu15/topology/die_id 1000 times
>
> With a single process, it takes 2ms (2us per open+read+close, looks OK).
>
> With one process per core (with careful binding, etc), it jumps from 2ms to 190ms (without much variation).
>
> It looks like locks in kernfs_iop_permission and kernfs_dop_revalidate are causing the issue.
>
> I am attaching the perf report callgraph output below.
>
>
>
> There are ways to avoid this
> multiple discoveries by sharing hwloc info through XML or shmem, but it
> will take years before all developers of different runtimes all
> implement this :)
>
> I don't understand, what exactly are you suggesting we do here instead?
>
> I was just saying userspace has ways to mitigate the issue but it will
> take time because many different projects are involved.
>
> I still don't understand, what issue are you referring to?
>
>
> Reading many sysfs files causing the application startup to takes many seconds when launching multiple process at the same time.
>
> Brice
>
>
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 7K of event 'cycles'
> # Event count (approx.): 5291578622
> #
> # Children Self Command Shared Object Symbol
> # ........ ........ ............. ................. .......................................
> #
> 99.91% 0.00% fops_overhead [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe
> |
> ---entry_SYSCALL_64_after_hwframe
> do_syscall_64
> |
> |--98.69%--__x64_sys_openat
> | |
> | --98.67%--do_sys_openat2
> | |
> | --98.57%--do_filp_open
> | path_openat
> | |
> | |--81.83%--link_path_walk.part.0
> | | |
> | | |--52.19%--inode_permission.part.0
> | | | |
> | | | --51.86%--kernfs_iop_permission
> | | | |
> | | | |--50.92%--__mutex_lock.constprop.0
> | | | | |
> | | | | --49.58%--osq_lock
> | | | |
> | | | --0.59%--mutex_unlock
> | | |
> | | --29.47%--walk_component
> | | |
> | | --29.10%--lookup_fast
> | | |
> | | --28.76%--kernfs_dop_revalidate
> | | |
> | | --28.29%--__mutex_lock.constprop.0
> | | |
> | | --27.65%--osq_lock
> | |
> | |--9.60%--lookup_fast
> | | |
> | | --9.50%--kernfs_dop_revalidate
> | | |
> | | --9.35%--__mutex_lock.constprop.0
> | | |
> | | --9.18%--osq_lock
> | |
> | |--6.17%--may_open
> | | |
> | | --6.13%--inode_permission.part.0
> | | |
> | | --6.10%--kernfs_iop_permission
> | | |
> | | --5.90%--__mutex_lock.constprop.0
> | | |
> | | --5.80%--osq_lock
> | |
> | --0.52%--do_dentry_open
> |
> --0.63%--__prepare_exit_to_usermode
> |
> --0.58%--task_work_run
>
> 99.91% 0.01% fops_overhead [kernel.kallsyms] [k] do_syscall_64
> |
> --99.89%--do_syscall_64
> |
> |--98.69%--__x64_sys_openat
> | |
> | --98.67%--do_sys_openat2
> | |
> | --98.57%--do_filp_open
> | path_openat
> | |
> | |--81.83%--link_path_walk.part.0
> | | |
> | | |--52.19%--inode_permission.part.0
> | | | |
> | | | --51.86%--kernfs_iop_permission
> | | | |
> | | | |--50.92%--__mutex_lock.constprop.0
> | | | | |
> | | | | --49.58%--osq_lock
> | | | |
> | | | --0.59%--mutex_unlock
> | | |
> | | --29.47%--walk_component
> | | |
> | | --29.10%--lookup_fast
> | | |
> | | --28.76%--kernfs_dop_revalidate
> | | |
> | | --28.29%--__mutex_lock.constprop.0
> | | |
> | | --27.65%--osq_lock
> | |
> | |--9.60%--lookup_fast
> | | |
> | | --9.50%--kernfs_dop_revalidate
> | | |
> | | --9.35%--__mutex_lock.constprop.0
> | | |
> | | --9.18%--osq_lock
> | |
> | |--6.17%--may_open
> | | |
> | | --6.13%--inode_permission.part.0
> | | |
> | | --6.10%--kernfs_iop_permission
> | | |
> | | --5.90%--__mutex_lock.constprop.0
> | | |
> | | --5.80%--osq_lock
> | |
> | --0.52%--do_dentry_open
> |
> --0.63%--__prepare_exit_to_usermode
> |
> --0.58%--task_work_run
>
> 98.72% 0.00% fops_overhead [unknown] [k] 0x7379732f73656369
> |
> ---0x7379732f73656369
> __GI___libc_open
> |
> --98.70%--entry_SYSCALL_64_after_hwframe
> do_syscall_64
> |
> --98.66%--__x64_sys_openat
> |
> --98.65%--do_sys_openat2
> |
> --98.55%--do_filp_open
> path_openat
> |
> |--81.80%--link_path_walk.part.0
> | |
> | |--52.16%--inode_permission.part.0
> | | |
> | | --51.86%--kernfs_iop_permission
> | | |
> | | |--50.92%--__mutex_lock.constprop.0
> | | | |
> | | | --49.58%--osq_lock
> | | |
> | | --0.59%--mutex_unlock
> | |
> | --29.47%--walk_component
> | |
> | --29.10%--lookup_fast
> | |
> | --28.76%--kernfs_dop_revalidate
> | |
> | --28.29%--__mutex_lock.constprop.0
> | |
> | --27.65%--osq_lock
> |
> |--9.60%--lookup_fast
> | |
> | --9.50%--kernfs_dop_revalidate
> | |
> | --9.35%--__mutex_lock.constprop.0
> | |
> | --9.18%--osq_lock
> |
> |--6.17%--may_open
> | |
> | --6.13%--inode_permission.part.0
> | |
> | --6.10%--kernfs_iop_permission
> | |
> | --5.90%--__mutex_lock.constprop.0
> | |
> | --5.80%--osq_lock
> |
> --0.52%--do_dentry_open
>
> 98.72% 0.00% fops_overhead libc-2.31.so [.] __GI___libc_open
> |
> ---__GI___libc_open
> |
> --98.70%--entry_SYSCALL_64_after_hwframe
> do_syscall_64
> |
> --98.66%--__x64_sys_openat
> |
> --98.65%--do_sys_openat2
> |
> --98.55%--do_filp_open
> path_openat
> |
> |--81.80%--link_path_walk.part.0
> | |
> | |--52.16%--inode_permission.part.0
> | | |
> | | --51.86%--kernfs_iop_permission
> | | |
> | | |--50.92%--__mutex_lock.constprop.0
> | | | |
> | | | --49.58%--osq_lock
> | | |
> | | --0.59%--mutex_unlock
> | |
> | --29.47%--walk_component
> | |
> | --29.10%--lookup_fast
> | |
> | --28.76%--kernfs_dop_revalidate
> | |
> | --28.29%--__mutex_lock.constprop.0
> | |
> | --27.65%--osq_lock
> |
> |--9.60%--lookup_fast
> | |
> | --9.50%--kernfs_dop_revalidate
> | |
> | --9.35%--__mutex_lock.constprop.0
> | |
> | --9.18%--osq_lock
> |
> |--6.17%--may_open
> | |
> | --6.13%--inode_permission.part.0
> | |
> | --6.10%--kernfs_iop_permission
> | |
> | --5.90%--__mutex_lock.constprop.0
> | |
> | --5.80%--osq_lock
> |
> --0.52%--do_dentry_open
>
> 98.69% 0.01% fops_overhead [kernel.kallsyms] [k] __x64_sys_openat
> |
> --98.67%--__x64_sys_openat
> do_sys_openat2
> |
> --98.57%--do_filp_open
> path_openat
> |
> |--81.83%--link_path_walk.part.0
> | |
> | |--52.19%--inode_permission.part.0
> | | |
> | | --51.86%--kernfs_iop_permission
> | | |
> | | |--50.92%--__mutex_lock.constprop.0
> | | | |
> | | | --49.58%--osq_lock
> | | |
> | | --0.59%--mutex_unlock
> | |
> | --29.47%--walk_component
> | |
> | --29.10%--lookup_fast
> | |
> | --28.76%--kernfs_dop_revalidate
> | |
> | --28.29%--__mutex_lock.constprop.0
> | |
> | --27.65%--osq_lock
> |
> |--9.60%--lookup_fast
> | |
> | --9.50%--kernfs_dop_revalidate
> | |
> | --9.35%--__mutex_lock.constprop.0
> | |
> | --9.18%--osq_lock
> |
> |--6.17%--may_open
> | |
> | --6.13%--inode_permission.part.0
> | |
> | --6.10%--kernfs_iop_permission
> | |
> | --5.90%--__mutex_lock.constprop.0
> | |
> | --5.80%--osq_lock
> |
> --0.52%--do_dentry_open
>
> 98.67% 0.03% fops_overhead [kernel.kallsyms] [k] do_sys_openat2
> |
> --98.65%--do_sys_openat2
> |
> --98.57%--do_filp_open
> path_openat
> |
> |--81.83%--link_path_walk.part.0
> | |
> | |--52.19%--inode_permission.part.0
> | | |
> | | --51.86%--kernfs_iop_permission
> | | |
> | | |--50.92%--__mutex_lock.constprop.0
> | | | |
> | | | --49.58%--osq_lock
> | | |
> | | --0.59%--mutex_unlock
> | |
> | --29.47%--walk_component
> | |
> | --29.10%--lookup_fast
> | |
> | --28.76%--kernfs_dop_revalidate
> | |
> | --28.29%--__mutex_lock.constprop.0
> | |
> | --27.65%--osq_lock
> |
> |--9.60%--lookup_fast
> | |
> | --9.50%--kernfs_dop_revalidate
> | |
> | --9.35%--__mutex_lock.constprop.0
> | |
> | --9.18%--osq_lock
> |
> |--6.17%--may_open
> | |
> | --6.13%--inode_permission.part.0
> | |
> | --6.10%--kernfs_iop_permission
> | |
> | --5.90%--__mutex_lock.constprop.0
> | |
> | --5.80%--osq_lock
> |
> --0.52%--do_dentry_open
>
> 98.57% 0.00% fops_overhead [kernel.kallsyms] [k] do_filp_open
> |
> ---do_filp_open
> path_openat
> |
> |--81.83%--link_path_walk.part.0
> | |
> | |--52.19%--inode_permission.part.0
> | | |
> | | --51.86%--kernfs_iop_permission
> | | |
> | | |--50.92%--__mutex_lock.constprop.0
> | | | |
> | | | --49.58%--osq_lock
> | | |
> | | --0.59%--mutex_unlock
> | |
> | --29.47%--walk_component
> | |
> | --29.10%--lookup_fast
> | |
> | --28.76%--kernfs_dop_revalidate
> | |
> | --28.29%--__mutex_lock.constprop.0
> | |
> | --27.65%--osq_lock
> |
> |--9.60%--lookup_fast
> | |
> | --9.50%--kernfs_dop_revalidate
> | |
> | --9.35%--__mutex_lock.constprop.0
> | |
> | --9.18%--osq_lock
> |
> |--6.17%--may_open
> | |
> | --6.13%--inode_permission.part.0
> | |
> | --6.10%--kernfs_iop_permission
> | |
> | --5.90%--__mutex_lock.constprop.0
> | |
> | --5.80%--osq_lock
> |
> --0.52%--do_dentry_open
>
> 98.57% 0.01% fops_overhead [kernel.kallsyms] [k] path_openat
> |
> --98.56%--path_openat
> |
> |--81.83%--link_path_walk.part.0
> | |
> | |--52.19%--inode_permission.part.0
> | | |
> | | --51.86%--kernfs_iop_permission
> | | |
> | | |--50.92%--__mutex_lock.constprop.0
> | | | |
> | | | --49.58%--osq_lock
> | | |
> | | --0.59%--mutex_unlock
> | |
> | --29.47%--walk_component
> | |
> | --29.10%--lookup_fast
> | |
> | --28.76%--kernfs_dop_revalidate
> | |
> | --28.29%--__mutex_lock.constprop.0
> | |
> | --27.65%--osq_lock
> |
> |--9.60%--lookup_fast
> | |
> | --9.50%--kernfs_dop_revalidate
> | |
> | --9.35%--__mutex_lock.constprop.0
> | |
> | --9.18%--osq_lock
> |
> |--6.17%--may_open
> | |
> | --6.13%--inode_permission.part.0
> | |
> | --6.10%--kernfs_iop_permission
> | |
> | --5.90%--__mutex_lock.constprop.0
> | |
> | --5.80%--osq_lock
> |
> --0.52%--do_dentry_open
>
> 94.52% 1.30% fops_overhead [kernel.kallsyms] [k] __mutex_lock.constprop.0
> |
> |--93.23%--__mutex_lock.constprop.0
> | |
> | |--92.23%--osq_lock
> | |
> | --0.55%--mutex_spin_on_owner
> |
> --1.30%--0x7379732f73656369
> __GI___libc_open
> entry_SYSCALL_64_after_hwframe
> do_syscall_64
> __x64_sys_openat
> do_sys_openat2
> do_filp_open
> path_openat
> |
> --1.09%--link_path_walk.part.0
> |
> --0.75%--inode_permission.part.0
> kernfs_iop_permission
> __mutex_lock.constprop.0
>
> 92.24% 92.22% fops_overhead [kernel.kallsyms] [k] osq_lock
> |
> --92.22%--0x7379732f73656369
> __GI___libc_open
> entry_SYSCALL_64_after_hwframe
> do_syscall_64
> __x64_sys_openat
> do_sys_openat2
> do_filp_open
> path_openat
> |
> |--77.21%--link_path_walk.part.0
> | |
> | |--49.57%--inode_permission.part.0
> | | kernfs_iop_permission
> | | __mutex_lock.constprop.0
> | | osq_lock
> | |
> | --27.64%--walk_component
> | lookup_fast
> | kernfs_dop_revalidate
> | __mutex_lock.constprop.0
> | osq_lock
> |
> |--9.18%--lookup_fast
> | kernfs_dop_revalidate
> | __mutex_lock.constprop.0
> | osq_lock
> |
> --5.80%--may_open
> inode_permission.part.0
> kernfs_iop_permission
> __mutex_lock.constprop.0
> osq_lock
>
> 81.83% 0.03% fops_overhead [kernel.kallsyms] [k] link_path_walk.part.0
> |
> --81.80%--link_path_walk.part.0
> |
> |--52.19%--inode_permission.part.0
> | |
> | --51.86%--kernfs_iop_permission
> | |
> | |--50.92%--__mutex_lock.constprop.0
> | | |
> | | --49.58%--osq_lock
> | |
> | --0.59%--mutex_unlock
> |
> --29.47%--walk_component
> |
> --29.10%--lookup_fast
> |
> --28.76%--kernfs_dop_revalidate
> |
> --28.29%--__mutex_lock.constprop.0
> |
> --27.65%--osq_lock
>
> 58.32% 0.24% fops_overhead [kernel.kallsyms] [k] inode_permission.part.0
> |
> --58.08%--inode_permission.part.0
> |
> --57.97%--kernfs_iop_permission
> |
> |--56.81%--__mutex_lock.constprop.0
> | |
> | --55.39%--osq_lock
> |
> --0.73%--mutex_unlock
>
> 57.97% 0.00% fops_overhead [kernel.kallsyms] [k] kernfs_iop_permission
> |
> ---kernfs_iop_permission
> |
> |--56.81%--__mutex_lock.constprop.0
> | |
> | --55.39%--osq_lock
> |
> --0.73%--mutex_unlock
>
> 38.71% 0.03% fops_overhead [kernel.kallsyms] [k] lookup_fast
> |
> --38.68%--lookup_fast
> |
> --38.26%--kernfs_dop_revalidate
> |
> --37.64%--__mutex_lock.constprop.0
> |
> --36.83%--osq_lock
>
> 38.26% 0.04% fops_overhead [kernel.kallsyms] [k] kernfs_dop_revalidate
> |
> --38.22%--kernfs_dop_revalidate
> |
> --37.64%--__mutex_lock.constprop.0
> |
> --36.83%--osq_lock
>
> 29.47% 0.03% fops_overhead [kernel.kallsyms] [k] walk_component
> |
> --29.44%--walk_component
> |
> --29.10%--lookup_fast
> |
> --28.76%--kernfs_dop_revalidate
> |
> --28.29%--__mutex_lock.constprop.0
> |
> --27.65%--osq_lock
>
> 6.17% 0.03% fops_overhead [kernel.kallsyms] [k] may_open
> |
> --6.14%--may_open
> |
> --6.13%--inode_permission.part.0
> |
> --6.10%--kernfs_iop_permission
> |
> --5.90%--__mutex_lock.constprop.0
> |
> --5.80%--osq_lock
>
> 1.22% 0.00% fops_overhead [unknown] [k] 0x5541d68949564100
> |
> ---0x5541d68949564100
> __libc_start_main
> |
> |--0.68%--__close
> | |
> | --0.66%--entry_SYSCALL_64_after_hwframe
> | do_syscall_64
> | |
> | --0.61%--__prepare_exit_to_usermode
> | |
> | --0.58%--task_work_run
> |
> --0.54%--read
>
> 1.22% 0.00% fops_overhead libc-2.31.so [.] __libc_start_main
> |
> ---__libc_start_main
> |
> |--0.68%--__close
> | |
> | --0.66%--entry_SYSCALL_64_after_hwframe
> | do_syscall_64
> | |
> | --0.61%--__prepare_exit_to_usermode
> | |
> | --0.58%--task_work_run
> |
> --0.54%--read
>
> 1.06% 1.05% fops_overhead [kernel.kallsyms] [k] mutex_unlock
> |
> --1.02%--0x7379732f73656369
> __GI___libc_open
> entry_SYSCALL_64_after_hwframe
> do_syscall_64
> __x64_sys_openat
> do_sys_openat2
> do_filp_open
> path_openat
> |
> --0.80%--link_path_walk.part.0
> |
> --0.60%--inode_permission.part.0
> kernfs_iop_permission
> |
> --0.59%--mutex_unlock
>
> 0.88% 0.79% fops_overhead [kernel.kallsyms] [k] mutex_lock
> |
> --0.68%--0x7379732f73656369
> __GI___libc_open
> entry_SYSCALL_64_after_hwframe
> do_syscall_64
> __x64_sys_openat
> do_sys_openat2
> do_filp_open
> path_openat
>
> 0.68% 0.01% fops_overhead libc-2.31.so [.] __close
> |
> --0.67%--__close
> |
> --0.66%--entry_SYSCALL_64_after_hwframe
> do_syscall_64
> |
> --0.61%--__prepare_exit_to_usermode
> |
> --0.58%--task_work_run
>
> 0.63% 0.05% fops_overhead [kernel.kallsyms] [k] __prepare_exit_to_usermode
> |
> --0.58%--__prepare_exit_to_usermode
> task_work_run
>
> 0.58% 0.00% fops_overhead [kernel.kallsyms] [k] task_work_run
> |
> ---task_work_run
>
> 0.58% 0.10% fops_overhead [kernel.kallsyms] [k] dput
> 0.56% 0.55% fops_overhead [kernel.kallsyms] [k] mutex_spin_on_owner
> |
> --0.55%--0x7379732f73656369
> __GI___libc_open
> entry_SYSCALL_64_after_hwframe
> do_syscall_64
> __x64_sys_openat
> do_sys_openat2
> do_filp_open
> path_openat
>
> 0.54% 0.00% fops_overhead libc-2.31.so [.] read
> |
> ---read
>
> 0.52% 0.12% fops_overhead [kernel.kallsyms] [k] do_dentry_open
> 0.50% 0.00% fops_overhead [kernel.kallsyms] [k] ksys_read
> 0.50% 0.03% fops_overhead [kernel.kallsyms] [k] vfs_read
> 0.46% 0.05% fops_overhead [kernel.kallsyms] [k] __fput
> 0.45% 0.45% fops_overhead [kernel.kallsyms] [k] lockref_put_return
> 0.43% 0.43% fops_overhead [kernel.kallsyms] [k] osq_unlock
> 0.41% 0.08% fops_overhead [kernel.kallsyms] [k] step_into
> 0.41% 0.08% fops_overhead [kernel.kallsyms] [k] __d_lookup
> 0.37% 0.35% fops_overhead [kernel.kallsyms] [k] _raw_spin_lock
> 0.35% 0.03% fops_overhead [kernel.kallsyms] [k] seq_read
> 0.28% 0.01% fops_overhead [kernel.kallsyms] [k] kernfs_fop_open
> 0.27% 0.03% fops_overhead [kernel.kallsyms] [k] kernfs_fop_release
> 0.16% 0.01% fops_overhead [kernel.kallsyms] [k] kernfs_put_open_node
> 0.16% 0.00% fops_overhead [kernel.kallsyms] [k] terminate_walk
> 0.12% 0.01% fops_overhead [kernel.kallsyms] [k] __alloc_file
> 0.12% 0.00% fops_overhead [kernel.kallsyms] [k] alloc_empty_file
> 0.12% 0.01% fops_overhead [kernel.kallsyms] [k] unlazy_walk
> 0.12% 0.05% fops_overhead [kernel.kallsyms] [k] _cond_resched
> 0.12% 0.07% fops_overhead [kernel.kallsyms] [k] call_rcu
> 0.10% 0.00% fops_overhead [kernel.kallsyms] [k] __legitimize_path
> 0.09% 0.05% fops_overhead [kernel.kallsyms] [k] sysfs_kf_seq_show
> 0.09% 0.09% fops_overhead [kernel.kallsyms] [k] generic_permission
> 0.09% 0.07% fops_overhead [kernel.kallsyms] [k] rcu_all_qs
> 0.09% 0.01% fops_overhead [kernel.kallsyms] [k] security_file_open
> 0.08% 0.00% fops_overhead [kernel.kallsyms] [k] security_file_alloc
> 0.08% 0.08% fops_overhead [kernel.kallsyms] [k] lockref_get_not_dead
> 0.08% 0.03% fops_overhead [kernel.kallsyms] [k] kmem_cache_alloc
> 0.08% 0.08% fops_overhead [kernel.kallsyms] [k] apparmor_file_open
> 0.07% 0.05% fops_overhead [kernel.kallsyms] [k] kfree
> 0.05% 0.05% fops_overhead [kernel.kallsyms] [k] kernfs_fop_read
> 0.05% 0.05% fops_overhead [kernel.kallsyms] [k] set_nlink
> 0.05% 0.01% fops_overhead [kernel.kallsyms] [k] kernfs_seq_start
> 0.05% 0.03% fops_overhead [kernel.kallsyms] [k] path_init
> 0.05% 0.00% fops_overhead [kernel.kallsyms] [k] __x64_sys_close
> 0.05% 0.00% fops_overhead [kernel.kallsyms] [k] filp_close
> 0.05% 0.05% fops_overhead [kernel.kallsyms] [k] syscall_return_via_sysret
> 0.05% 0.03% fops_overhead [kernel.kallsyms] [k] __kmalloc_node
> 0.05% 0.05% fops_overhead [kernel.kallsyms] [k] rcu_segcblist_enqueue
> 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] vfs_open
> 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> 0.04% 0.03% fops_overhead [kernel.kallsyms] [k] sprintf
> 0.04% 0.00% fops_overhead [kernel.kallsyms] [k] dev_attr_show
> 0.04% 0.00% fops_overhead [kernel.kallsyms] [k] die_id_show
> 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] kmem_cache_free
> 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] fsnotify_parent
> 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] security_inode_permission
> 0.04% 0.01% fops_overhead [kernel.kallsyms] [k] __check_object_size
> 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] apparmor_file_alloc_security
> 0.04% 0.00% fops_overhead [kernel.kallsyms] [k] seq_release
> 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] memset_erms
> 0.04% 0.04% fops_overhead [kernel.kallsyms] [k] kernfs_get_active
> 0.03% 0.00% fops_overhead [kernel.kallsyms] [k] try_to_wake_up
> 0.03% 0.00% fops_overhead [kernel.kallsyms] [k] vsnprintf
> 0.03% 0.01% fops_overhead [kernel.kallsyms] [k] mntput_no_expire
> 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] lockref_get
> 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] kernfs_put_active
> 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] fsnotify
> 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] locks_remove_posix
> 0.03% 0.00% fops_overhead [kernel.kallsyms] [k] security_file_permission
> 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] rw_verify_area
> 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] set_root
> 0.03% 0.00% fops_overhead [kernel.kallsyms] [k] nd_jump_root
> 0.03% 0.01% fops_overhead [kernel.kallsyms] [k] wake_up_q
> 0.03% 0.00% fops_overhead [kernel.kallsyms] [k] __mutex_unlock_slowpath.constprop.0
> 0.03% 0.00% fops_overhead [kernel.kallsyms] [k] getname_flags.part.0
> 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] task_work_add
> 0.03% 0.00% fops_overhead [kernel.kallsyms] [k] fput_many
> 0.03% 0.03% fops_overhead [kernel.kallsyms] [k] __legitimize_mnt
> 0.03% 0.01% fops_overhead [kernel.kallsyms] [k] kernfs_seq_stop
> 0.02% 0.02% fops_overhead [nfs] [k] nfs_do_access
> 0.02% 0.00% fops_overhead ld-2.31.so [.] _dl_map_object
> 0.02% 0.00% fops_overhead ld-2.31.so [.] open_path
> 0.02% 0.00% fops_overhead ld-2.31.so [.] __GI___open64_nocancel
> 0.02% 0.00% fops_overhead [nfs] [k] nfs_permission
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] kernfs_seq_next
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] available_idle_cpu
> 0.01% 0.00% fops_overhead [unknown] [k] 0x3931206e69207364
> 0.01% 0.00% fops_overhead libc-2.31.so [.] __GI___libc_write
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] ksys_write
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] vfs_write
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] tty_write
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] n_tty_write
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] pty_write
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] queue_work_on
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] __queue_work
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] select_task_rq_fair
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] native_queued_spin_lock_slowpath
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] slab_free_freelist_hook
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] __list_del_entry_valid
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] memcg_kmem_put_cache
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] __syscall_return_slowpath
> 0.01% 0.01% fops_overhead libc-2.31.so [.] _dl_addr
> 0.01% 0.00% fops_overhead [unknown] [.] 0x756e696c2d34365f
> 0.01% 0.00% fops_overhead [unknown] [.] 0x00007f4b1ca1e000
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] __x86_indirect_thunk_rax
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] __virt_addr_valid
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] locks_remove_file
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] memcpy_erms
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] update_rq_clock
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] entry_SYSCALL_64
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] __check_heap_object
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] apparmor_file_free_security
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] security_file_free
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] __d_lookup_rcu
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] mntput
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] get_unused_fd_flags
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] alloc_slab_page
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] __slab_alloc
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] ___slab_alloc
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] allocate_slab
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] __alloc_fd
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] legitimize_root
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] strncpy_from_user
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] kernfs_refresh_inode
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] build_open_flags
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] strcmp
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] memcg_kmem_get_cache
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] asm_sysvec_apic_timer_interrupt
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] sysvec_apic_timer_interrupt
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] asm_call_sysvec_on_stack
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] __sysvec_apic_timer_interrupt
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] hrtimer_interrupt
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] __hrtimer_run_queues
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] tick_sched_timer
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] tick_sched_handle
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] update_process_times
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] scheduler_tick
> 0.01% 0.01% fops_overhead [kernel.kallsyms] [k] perf_iterate_ctx
> 0.01% 0.00% fops_overhead [unknown] [k] 0x00007fd34e3a0627
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] __x64_sys_execve
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] do_execve
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] __do_execve_file
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] load_elf_binary
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] elf_map
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] vm_mmap_pgoff
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] do_mmap
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] mmap_region
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] perf_event_mmap
> 0.01% 0.00% fops_overhead [kernel.kallsyms] [k] perf_iterate_sb
> 0.00% 0.00% perf_5.8 [unknown] [k] 0x00007fd34e3a0627
> 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe
> 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] perf_event_exec
> 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] do_syscall_64
> 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] __x64_sys_execve
> 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] do_execve
> 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] __do_execve_file
> 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] load_elf_binary
> 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] begin_new_exec
> 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] native_write_msr
> 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] __intel_pmu_enable_all.constprop.0
> 0.00% 0.00% perf_5.8 [kernel.kallsyms] [k] acpi_os_read_memory
>
>
> #
> # (Tip: To count events in every 1000 msec: perf stat -I 1000)
> #
>

Hi Brice,

I wrote a benchmark to do open+read+close on
/sys/devices/system/cpu/cpu0/topology/die_id
https://github.com/foxhlchen/sysfs_benchmark/blob/main/main.c


+ 3.39% 3.37% a.out [kernel.kallsyms] [k] mutex_unlock

+ 2.76% 2.74% a.out [kernel.kallsyms] [k] mutex_lock

+ 0.92% 0.42% a.out [kernel.kallsyms] [k]
__mutex_lock.constprop.0 ▒
0.38% 0.37% a.out [kernel.kallsyms] [k]
mutex_spin_on_owner ▒
0.05% 0.05% a.out [kernel.kallsyms] [k] __mutex_init

0.01% 0.01% a.out [kernel.kallsyms] [k]
__mutex_lock_slowpath ▒
0.01% 0.00% a.out [kernel.kallsyms] [k]
__mutex_unlock_slowpath.constprop.0

But I failed to reproduce your result.

If it is possible, would you mind providing your benchmark code? :)


thanks,
fox