RE: [PATCH 0/2] tools perf: Add a new benchmark tool forsemaphore/mutex

From: Chen, Dennis (SRDC SW)
Date: Mon Apr 16 2012 - 10:10:42 EST


On Mon, Apr 16, 2012 at 5:24 PM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> * Chen, Dennis (SRDC SW) <Dennis1.Chen@xxxxxxx> wrote:
>
>> <PATCH PREFACE>
>> -------------------
>> This patch series are used to add a new performance benchmark tool for semaphore or mutex:
>> The new tool will fork NR tasks specified through the command line and bind each of them
>> to every CPUs in the system equally. The command to launch the tool looks like:
>> '# perf bench locking mutex -p 8 -t 400 -c'
>>
>> The above command will create 400 tasks in a system with 8-CPU, each CPU will have 50 tasks.
>> After the task be created, it will read all the files and directories in '/sys/module'.
>> sysfs is RAM based and its read operation for both dir and file is very sensitive for mutex
>> lock, also '/sys/module' has almost no dependencies on external devices.
>>
>> We can use this tool with 'perf record' command to get the hot-spot of the codes or
>> 'perf top -g' to get live info, for example, below is a test case run in a intel i7-2600 box
>> (-c option is to get the cpu cycles, I don't use it in this test case):
>>
>> # perf record -a perf bench locking mutex -p 8 -t 4000
>> # Running locking/mutex benchmark...
>> ...
>> [13894 ]/6 duration 23 s 609392 us
>> [13996 ]/4 duration 23 s 599418 us
>> [14056 ]/0 duration 23 s 595710 us
>> [13715 ]/3 duration 23 s 621719 us
>> [13390 ]/6 duration 23 s 644020 us
>> [13696 ]/0 duration 23 s 623101 us
>> [14334 ]/6 duration 23 s 580262 us
>> [14343 ]/7 duration 23 s 578702 us
>> [14283 ]/3 duration 23 s 583007 us
>> -----------------------------------
>> Total duration 79353 s 943945 us
>>
>> real: 23.84 s
>> user: 0.00
>> sys: 0.45
>>
>> # perf report
>> ===================================================================================
>> ...
>> # perf version : 3.3.2
>> # arch : x86_64
>> # nrcpus online : 8
>> # nrcpus avail : 8
>> # cpudesc : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
>> # total memory : 3966460 kB
>> # cmdline : /usr/bin/perf record -a perf bench locking mutex -p 8 -t 4000
>>
>> # Events: 131K cycles
>> #
>> # Overhead Command Shared Object Symbol
>> # ........ ............... ................................. .....................................
>> #
>> 22.12% perf [kernel.kallsyms] [k] __mutex_lock_slowpath
>> 8.27% perf [kernel.kallsyms] [k] _raw_spin_lock
>> 6.16% perf [kernel.kallsyms] [k] mutex_unlock
>> 5.22% perf [kernel.kallsyms] [k] mutex_spin_on_owner
>> 4.94% perf [kernel.kallsyms] [k] sysfs_refresh_inode
>> 4.82% perf [kernel.kallsyms] [k] mutex_lock
>> 2.67% perf [kernel.kallsyms] [k] __mutex_unlock_slowpath
>> 2.61% perf [kernel.kallsyms] [k] link_path_walk
>> 2.42% perf [kernel.kallsyms] [k] _raw_spin_lock_irqsave
>> 1.61% perf [kernel.kallsyms] [k] __d_lookup
>> 1.18% perf [kernel.kallsyms] [k] clear_page_c
>> 1.16% perf [kernel.kallsyms] [k] dput
>> 0.97% perf [kernel.kallsyms] [k] do_lookup
>> 0.93% swapper [kernel.kallsyms] [k] intel_idle
>> 0.87% perf [kernel.kallsyms] [k] get_page_from_freelist
>> 0.85% perf [kernel.kallsyms] [k] __strncpy_from_user
>> 0.81% perf [kernel.kallsyms] [k] system_call
>> 0.78% perf libc-2.13.so [.] 0x84ef0
>> 0.71% perf [kernel.kallsyms] [k] vfsmount_lock_local_lock
>> 0.68% perf [kernel.kallsyms] [k] sysfs_dentry_revalidate
>> 0.62% perf [kernel.kallsyms] [k] try_to_wake_up
>> 0.62% perf [kernel.kallsyms] [k] kfree
>> 0.60% perf [kernel.kallsyms] [k] kmem_cache_alloc
>> ............................................................................................
>>
>
> Nice! Would be nice to lift some of this information over into
> the changelogs, to address my complaints in the previous mail.

Thanks for the suggestion! I will resubmit the patches into a single patch and include the above info
to address the changelog issue...

>> We can see that for 4000 tasks running in 8 CPUs simultaneously, it will create a very heavy
>> contention for the mutex lock, so lot's of tasks enter into the slow path of the mutex lock...
>> I am very curious if we switch the mutex to the semaphore in this case, how's thing going?
>> My next plan
>
> Seems like an unfinished sentence.

Oh, I mean my next plan is to do some performance analysis of the 2 primitives with this tool...

> Thanks,
>
> Ingo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/