[PATCH V2 1/1] kvm/vmx: Add a tracepoint write_tsc_offset

From: Yoshihiro YUNOMAE
Date: Tue Jun 04 2013 - 04:36:58 EST

Add a tracepoint write_tsc_offset for tracing TSC offset change.
We want to merge ftrace's trace data of guest OSs and the host OS using
TSC for timestamp in chronological order. We need "TSC offset" values for
each guest when merge those because the TSC value on a guest is always the
host TSC plus guest's TSC offset. If we get the TSC offset values, we can
calculate the host TSC value for each guest events from the TSC offset and
the event TSC value. The host TSC values of the guest events are used when we
want to merge trace data of guests and the host in chronological order.
(Note: the trace_clock of both the host and the guest must be set x86-tsc in
this case)

TSC offset is stored in the VMCS by vmx_write_tsc_offset() or
vmx_adjust_tsc_offset(). KVM executes the former function when a guest boots.
The latter function is executed when kvm clock is updated. Only host can read
TSC offset value from VMCS, so a host needs to output TSC offset value
when TSC offset is changed.

Since the TSC offset is not often changed, it could be overwritten by other
frequent events while tracing. To avoid that, I recommend to use a special
instance for getting this event:

1. set a instance before booting a guest
# cd /sys/kernel/debug/tracing/instances
# mkdir tsc_offset
# cd tsc_offset
# echo x86-tsc > trace_clock
# echo 1 > events/kvm/kvm_write_tsc_offset/enable

2. boot a guest

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@xxxxxxxxxxx>
Cc: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
Cc: Gleb Natapov <gleb@xxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
arch/x86/kvm/trace.h | 18 ++++++++++++++++++
arch/x86/kvm/vmx.c | 3 +++
arch/x86/kvm/x86.c | 1 +
3 files changed, 22 insertions(+)

diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index fe5e00e..9c22e39 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -815,6 +815,24 @@ TRACE_EVENT(kvm_track_tsc,
__print_symbolic(__entry->host_clock, host_clocks))

+ TP_PROTO(__u64 previous_tsc_offset, __u64 next_tsc_offset),
+ TP_ARGS(previous_tsc_offset, next_tsc_offset),
+ TP_STRUCT__entry(
+ __field( __u64, previous_tsc_offset )
+ __field( __u64, next_tsc_offset )
+ ),
+ TP_fast_assign(
+ __entry->previous_tsc_offset = previous_tsc_offset;
+ __entry->next_tsc_offset = next_tsc_offset;
+ ),
+ TP_printk("previous=%llu next=%llu",
+ __entry->previous_tsc_offset, __entry->next_tsc_offset)
#endif /* CONFIG_X86_64 */

#endif /* _TRACE_KVM_H */
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 25a791e..00f7dde 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2096,6 +2096,7 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
(nested_cpu_has(vmcs12, CPU_BASED_USE_TSC_OFFSETING) ?
vmcs12->tsc_offset : 0));
} else {
+ trace_kvm_write_tsc_offset(vmcs_read64(TSC_OFFSET), offset);
vmcs_write64(TSC_OFFSET, offset);
@@ -2103,6 +2104,8 @@ static void vmx_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
static void vmx_adjust_tsc_offset(struct kvm_vcpu *vcpu, s64 adjustment, bool host)
u64 offset = vmcs_read64(TSC_OFFSET);
+ trace_kvm_write_tsc_offset(offset, offset + adjustment);
vmcs_write64(TSC_OFFSET, offset + adjustment);
if (is_guest_mode(vcpu)) {
/* Even when running L2, the adjustment needs to apply to L1 */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a8b1a..c942a0c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7264,3 +7264,4 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intr_vmexit);

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/