Re: [PATCH] perf lock: clean the options for perf record

From: Frederic Weisbecker
Date: Thu Feb 24 2011 - 11:52:18 EST

On Fri, Feb 25, 2011 at 12:46:40AM +0900, Hitoshi Mitake wrote:
> On 2011å02æ23æ 13:17, Hitoshi Mitake wrote:
> >On 2011å02æ23æ 03:22, Frederic Weisbecker wrote:
> >>On Tue, Feb 22, 2011 at 04:43:35PM +0100, Peter Zijlstra wrote:
> >>>On Wed, 2011-02-23 at 00:30 +0900, Hitoshi Mitake wrote:
> >>>>How do you think about it?
> >>>
> >>>Most of the lock code (esp the spinlock stuff) is already way over the
> >>>threshold of sanity, adding to that for some dubious reasons doesn't
> >>>seem like a good idea.
> >>>
> >>>I'm still not at all sure why people want all this lock tracing.
> >>
> >>Right, well I can imagine many usecases that could make lock
> >>tracing bring more value than what lockstat already provides,
> >>through a tool like perf lock if we enhance it.
> >>
> >>We should probably first focus on developing the tooling side
> >>and make it useful enough that optimizations in the kernel
> >>side become desirable.
> >>
> >
> >Yes, lockstat only provides the lock usage statistics of
> >entire of the system. perf lock will be able to provide the partial
> >information of specified term, or the degree of dependency
> >between locks.
> >
> For trial, I created new tracepoint for rwsem and tested.
> Names of events are rwsem_{acquire, contended, acquired, release},
> their meanings are similar to lock_{...}.
> I traced perf bench sched messaging and result was,
> mitake@x201i:~/linux/.../tools/perf% ./perf bench sched messaging
> # Running sched/messaging benchmark...
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> Total time: 1.252 [sec]
> mitake@x201i:~/linux/.../tools/perf% sudo ./perf record -R -m 1024
> -c 1 -e rwsem:rwsem_acquire -e
> rwsem:rwsem_release,rwsem:rwsem_contended,rwsem:rwsem_acquired
> ./perf bench sched messaging
> # Running sched/messaging benchmark...
> # 20 sender and receiver processes per group
> # 10 groups == 400 processes run
> Total time: 1.332 [sec]
> [ perf record: Woken up 4 times to write data ]
> [ perf record: Captured and wrote 13.495 MB (~589597 samples) ]
> raw execution of sched messaging was 1.252 sec, and traced version
> was 1.332 sec. This overhead is far smaller than the overhead of
> current lock tracepoints.

Probably because rwsem are only a small bunch of locks among all others.
If you were to trace only spinlocks, I bet you'd find a significant
overhead pretty close to a wide lock tracing.

> I think that it is possible to write some meaningful tools
> like reader/writer ratio measuring. If something can be written,
> I'll post it.

Consider the situation from another angle: do you think that a lock
profiling on top of lock types is a kind of workflow that will be

The primary kind of workflow I have in mind for lock tracing is:

1) Let's look at the big picture, trace all locks and find those
that seem to be an issue (too much waiting time, too much
acquire time, etc...).

2) Pick one we are interested in and dig into details

But I can't figure out any common worklow that would be based
on mutex only tracing, or rwsem only tracing.
Or actually I can imagine such worklow. Every kind of lock
type have their own scale of latencies so it's interesting
to group the analysis per family. But I rather see
that as a secondary worklow. Once we'll have more finegrained
analysis on the tools for example, like comparison between
read and write latencies on some rwsems and so.

So once we have some such finegrained and useful features in the
tooling side, then justifying such change in the kernel is going
to be much more uncontroversial.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at