[PATCH V5 0/3] perf & kvm: Enhance perf to collect KVM guest osstatistics from host side

From: Zhang, Yanmin
Date: Mon Apr 19 2010 - 01:33:24 EST


Here is the new patch of V5 against tip/master of April 17th
if anyone wants to try it.

ChangeLog V5:
1) Split kernel patch to 2 parts. The one introduces
perf_guest_info_callbacks() and related register/unregister
functions. The other is the kvm implementation of the callbacks.
2) Port to tip/master tree of April 17th.
3) Fix a bug which causes the module parsing of default guest kernel
fail.

ChangeLog V4:
1) Based on Ingo's comments, I added help information around kvm
such like command-list.txt and perf-kvm.txt.
2) Added guest process id at the tail of kernel dso long name, so
the display could show different label with different guest os.
3) Based on Avi's comments, erase the racy window which might
trigger an NMI while the NMI isn't in guest os.
4) Fixed all the errors and warnings reported by scripts/checkpatch.pl.
5) Fixed a compilation error pointed by Yang Sheng.

ChangeLog V3:
1) Add --guestmount=/dir/to/all/guestos parameter. Admin mounts guest os
root directories under /dir/to/all/guestos by sshfs. For example, I start
2 guest os. The one's pid is 8888 and the other's is 9999.
#mkdir ~/guestmount; cd ~/guestmount
#sshfs -o allow_other,direct_io -p 5551 localhost:/ 8888/
#sshfs -o allow_other,direct_io -p 5552 localhost:/ 9999/
#perf kvm --host --guest --guestmount=~/guestmount top

The old --guestkallsyms and --guestmodules are still supported as default
guest os symbol parsing.

2) Add guest os buildid support.
3) Add sub command 'perf kvm buildid-list'.
4) Delete sub command 'perf kvm stat', because our current implementation
doesn't transfer guest/host requirement to kernel, and kernel always
collects both host and guest statistics. So regular 'perf stat' is ok.
5) Fix a couple of perf bugs.
6) We still have no support on command with parameter 'any' as current KVM
just uses process id to identify specific guest os instance. Users could
uses parameter -p to collect specific guest os instance statistics.

ChangeLog V2:
1) Based on Avi's suggestion, I moved callback functions
to generic code area. So the kernel part of the patch is
clearer.
2) Add 'perf kvm stat'.


From: Zhang, Yanmin <yanmin_zhang@xxxxxxxxxxxxxxx>

Based on the discussion in KVM community, I worked out the patch to support
perf to collect guest os statistics from host side. This patch is implemented
with Ingo, Peter and some other guys' kind help. Yang Sheng pointed out a
critical bug and provided good suggestions with other guys. I really appreciate
their kind help.

The patch adds new sub command kvm to perf.

perf kvm top
perf kvm record
perf kvm report
perf kvm diff
perf kvm buildid-list

The new perf could profile guest os kernel except guest os user space, but it
could summarize guest os user space utilization per guest os.

Below are some examples.
1) perf kvm top
[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules top

---------------------------------------------------------------------------------------------------------------------------------------
PerfTop: 16024 irqs/sec kernel: 2.6% us: 0.6% guest kernel:76.2% guest us:20.6% exact: 0.0% [1000Hz cycles], (all, 16 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------

samples pcnt function DSO
_______ _____ ________________________ _______________________

3740.00 8.0% __ticket_spin_lock [guest.kernel.kallsyms]
2056.00 4.4% copy_user_generic_string [guest.kernel.kallsyms]
1412.00 3.0% resource_string [guest.kernel.kallsyms]
595.00 1.3% __switch_to [guest.kernel.kallsyms]
586.00 1.2% __d_lookup [guest.kernel.kallsyms]
574.00 1.2% tcp_sendmsg [guest.kernel.kallsyms]
565.00 1.2% kmem_cache_alloc [guest.kernel.kallsyms]
532.00 1.1% tcp_ack [guest.kernel.kallsyms]
494.00 1.1% __kmalloc [guest.kernel.kallsyms]
468.00 1.0% print_cfs_rq [guest.kernel.kallsyms]
437.00 0.9% link_path_walk [guest.kernel.kallsyms]
380.00 0.8% balance_runtime [guest.kernel.kallsyms]
379.00 0.8% kmem_cache_free [guest.kernel.kallsyms]
377.00 0.8% in_gate_area_no_task [guest.kernel.kallsyms]
374.00 0.8% get_page_from_freelist [guest.kernel.kallsyms]
372.00 0.8% mark_files_ro [guest.kernel.kallsyms]
368.00 0.8% _atomic_dec_and_lock [guest.kernel.kallsyms]
356.00 0.8% crc16 [crc16]
353.00 0.8% put_page [guest.kernel.kallsyms]

If you want to just show host data, pls. don't use parameter --guest.
The headline includes guest os kernel and userspace percentage.

2) perf kvm record
[root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules record -f -a sleep 60
[ perf record: Woken up 15 times to write data ]
[ perf record: Captured and wrote 29.385 MB perf.data.kvm (~1283837 samples) ]

3) perf kvm report
3.1) [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules report --sort pid --showcpuutilization>norm.host.guest.report.pid
# Samples: 424719292247
#
# Overhead sys us guest sys guest us Command: Pid
# ........ .....................
#
50.57% 1.02% 0.00% 39.97% 9.58% qemu-system-x86: 3587
49.32% 1.35% 0.01% 35.20% 12.76% qemu-system-x86: 3347
0.07% 0.07% 0.00% 0.00% 0.00% perf: 5217


Some performance guys require perf to show sys/us/guest_sys/guest_us per KVM guest
instance which is actually just a multi-threaded process. Above sub parameter --showcpuutilization
does so.

3.2) [root@lkp-ne01 norm]# perf kvm --host --guest --guestkallsyms=/home/ymzhang/guest/kallsyms
--guestmodules=/home/ymzhang/guest/modules report >norm.host.guest.report
# Samples: 2466991384118
#
# Overhead Command Shared Object Symbol
# ........ ............... ........................................................................ ......
#
29.11% qemu-system-x86 [guest.kernel.kallsyms] [g] __ticket_spin_lock
5.88% tbench_srv [kernel.kallsyms] [k] ftrace_likely_update
5.76% tbench [kernel.kallsyms] [k] ftrace_likely_update
3.88% qemu-system-x86 34c3255482 [u] 0x000034c3255482
1.83% tbench [kernel.kallsyms] [k] __lock_acquire
1.81% tbench_srv [kernel.kallsyms] [k] __lock_acquire
1.38% tbench_srv [kernel.kallsyms] [k] trace_hardirqs_off_caller
1.37% tbench [kernel.kallsyms] [k] trace_hardirqs_off_caller
1.13% qemu-system-x86 [guest.kernel.kallsyms] [g] copy_user_generic_string
1.04% tbench_srv [kernel.kallsyms] [k] validate_chain
1.00% tbench [kernel.kallsyms] [k] trace_hardirqs_on_caller
1.00% tbench_srv [kernel.kallsyms] [k] trace_hardirqs_on_caller
0.95% tbench [kernel.kallsyms] [k] do_raw_spin_lock


[u] means it's in guest os user space. [g] means in guest os kernel. Other info is very direct.
If it shows a module such like [ext4], it means guest kernel module, because native host kernel's
modules are start from something like /lib/modules/XXX.

4) --guestmount example. I started 2 guest os. Run dbench testing in the 1st and tbench in 2nd guest os.
[root@lkp-ne01 norm]#perf kvm --host --guest --guestmount=/home/ymzhang/guestmount/ top
---------------------------------------------------------------------------------------------------------------------------------------
PerfTop: 16014 irqs/sec kernel: 1.8% us: 0.0% guest kernel:75.5% guest us:22.7% exact: 0.0% [1000Hz cycles], (all, 16 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------

samples pcnt function DSO
_______ _____ ________________________ ________________________________________________________________

16583.00 9.3% __ticket_spin_lock [guest.kernel.kallsyms.3067]
7178.00 4.0% copy_user_generic_string [guest.kernel.kallsyms.3067]
4637.00 2.6% copy_user_generic_string [guest.kernel.kallsyms.3187]
2495.00 1.4% schedule [guest.kernel.kallsyms.3187]
2322.00 1.3% tcp_sendmsg [guest.kernel.kallsyms.3187]
2255.00 1.3% __d_lookup [guest.kernel.kallsyms.3067]
1892.00 1.1% __switch_to [guest.kernel.kallsyms.3187]
1884.00 1.1% kmem_cache_alloc [guest.kernel.kallsyms.3067]
1809.00 1.0% tcp_ack [guest.kernel.kallsyms.3187]
1733.00 1.0% _atomic_dec_and_lock [guest.kernel.kallsyms.3067]
1707.00 1.0% tcp_transmit_skb [guest.kernel.kallsyms.3187]
1612.00 0.9% tcp_recvmsg [guest.kernel.kallsyms.3187]
1546.00 0.9% __kmalloc [guest.kernel.kallsyms.3067]
1538.00 0.9% __ticket_spin_lock [guest.kernel.kallsyms.3187]
1467.00 0.8% link_path_walk [guest.kernel.kallsyms.3067]
1403.00 0.8% path_get [guest.kernel.kallsyms.3067]

Signed-off-by: Zhang Yanmin <yanmin_zhang@xxxxxxxxxxxxxxx>



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/