On Thu, 2010-01-07 at 19:39 +0900, Hitoshi Mitake wrote:There are a lot of lock instances with same names (e.g. port_lock).
This patch series add __FILE__ and __LINE__ to lockdep_map,
and these will be used for trace lock events.
Example use from perf lock map:
| 0xffffea0004c992b8: __pte_lockptr(page) (src: include/linux/mm.h, line: 952)
| 0xffffea0004b112b8: __pte_lockptr(page) (src: include/linux/mm.h, line: 952)
| 0xffffea0004a3f2b8: __pte_lockptr(page) (src: include/linux/mm.h, line: 952)
| 0xffffea0004cd5228: __pte_lockptr(page) (src: include/linux/mm.h, line: 952)
| 0xffff8800b91e2b28:&sb->s_type->i_lock_key (src: fs/inode.c, line: 166)
| 0xffff8800bb9d7ae0: key (src: kernel/wait.c, line: 16)
| 0xffff8800aa07dae0:&dentry->d_lock (src: fs/dcache.c, line: 944)
| 0xffff8800b07fbae0:&dentry->d_lock (src: fs/dcache.c, line: 944)
| 0xffff8800b07f3ae0:&dentry->d_lock (src: fs/dcache.c, line: 944)
| 0xffff8800bf15fae0:&sighand->siglock (src: kernel/fork.c, line: 1490)
| 0xffff8800b90f7ae0:&dentry->d_lock (src: fs/dcache.c, line: 944)
| ...
(This output of perf lock map is produced by my local version,
I'll send this later.)
And sadly, as Peter Zijlstra predicted, this produces certain overhead.
Before appling this series:
| % sudo ./perf lock rec perf bench sched messaging
| # Running sched/messaging benchmark...
| # 20 sender and receiver processes per group
| # 10 groups == 400 processes run
|
| Total time: 3.834 [sec]
After:
sudo ./perf lock rec perf bench sched messaging
| # Running sched/messaging benchmark...
| # 20 sender and receiver processes per group
| # 10 groups == 400 processes run
|
| Total time: 5.415 [sec]
| [ perf record: Woken up 0 times to write data ]
| [ perf record: Captured and wrote 53.512 MB perf.data (~2337993 samples) ]
But raw exec of perf bench sched messaging is this:
| % perf bench sched messaging
| # Running sched/messaging benchmark...
| # 20 sender and receiver processes per group
| # 10 groups == 400 processes run
|
| Total time: 0.498 [sec]
Tracing lock events already produces amount of overhead.
I think the overhead produced by this series is not a fatal problem,
radically optimization is required...
Right, these patches look OK, for the tracing overhead, you could
possibly hash the file:line into a u64 and reduce the tracepoint size,
that should improve the situation I tihnk, because I seem to remember
the only thing that really matters for speed is the size of things.