Ok. With the PV perf interface, host perf saves all counter info into perf_eventIs live migration necessary on pv perf support?Yes.
structure. To support live migration, we need save all host perf_event structure,
or at least perf_event->count and perf_event->attr. Then, recreate the host perf_event
after migration.
I check qemu-kvm codes and it seems most live migration is to save cpu states.
So it seems it's hard for perf pv interface to match current live migration. Any suggestion?
They are really perf implementation specific. Even perf_event definitionWhat about documentation for individual fields? Esp. type, config, and
flags, but also the others.
has no document but code comments. I will add simple explanation around
the new structure definition.
Theoretically, we can remove it. But it could simplify the implementations and touch+guest_perf_event->count saves the latest count of the event.Is overflows really needed?
+guest_perf_event->overflows means how many times this event has overflowed
+since guest os processes it. Host kernel just inc guest_perf_event->overflows
+when the event overflows. Guest kernel should use a atomic_cmpxchg to reset
+guest_perf_event->overflows to 0 in case there is a race between its reset by
+guest os and host kernel data update.
perf generic codes as small as we can.
Since the guest can use NMI to read the1) para virt perf interface is to hide PMU hardware in host os. Guest os shouldn't
counter, it should have the highest possible priority, and thus it
shouldn't see any overflow unless it configured the threshold really low.
If we drop overflow, we can use the RDPMC instruction instead of
KVM_PERF_OP_READ. This allows the guest to allow userspace to read a
counter, or prevent userspace from reading the counter, by setting cr4.pce.
access PMU hardware directly. We could expose PMU hardware to guest os directly, but
that would be another guest os PMU support method. It shouldn't be a part of para virt
interface.
2) Consider below scenario: PMU counter overflows and NMI causes guest os vmexit to
host kernel. Host kernel schedules the vcpu thread to another physical cpu before
vmenter the guest os again. So later on, guest os just RDPMC the counter on another
cpu.
So I think above discussion is around how to expose PMU hardware to guest os. I will
also check this method after the para virt interface is done.
Yes, but it will belong to the method that exposes PMU hardware to guest os directly.+Host kernel saves count and overflow update information into guest_perf_eventWhat about using MSRs to configure the counter like real hardware? That
+pointed by guest_perf_event_param->guest_event_addr.
+
+After host kernel creates the event, this event is at disabled mode.
+
+This hypercall3 return 0 when host kernel creates the event successfully. Or
+other value if it fails.
+
+3) Enable event at host side:
+kvm_hypercall2(KVM_PERF_OP, KVM_PERF_OP_ENABLE, id);
+
+Parameter id means the event id allocated by guest os. Guest os need call this
+hypercall to enable the event at host side. Then, host side will really start
+to collect statistics by this event.
+
+This hypercall3 return 0 if host kernel succeds. Or other value if it fails.
+
+
+4) Disable event at host side:
+kvm_hypercall2(KVM_PERF_OP, KVM_PERF_OP_DISABLE, id);
+
+Parameter id means the event id allocated by guest os. Guest os need call this
+hypercall to disable the event at host side. Then, host side will stop
+statistics collection initiated by the event.
+
+This hypercall3 return 0 if host kernel succeds. Or other value if it fails.
+
+
+5) Close event at host side:
+kvm_hypercall2(KVM_PERF_OP, KVM_PERF_OP_CLOSE, id);
+it will close and delete the event at host side.
takes care of live migration, since we already migrate MSRs. At the end
of the migration userspace will read all config and counter data from
the source and transfer it to the destination. This should work with
existing userspace since we query the MSR index list from the host.