Hi Peter,
On 2020/4/24 20:16, Peter Zijlstra wrote:
On Thu, Apr 23, 2020 at 04:14:09PM +0800, Like Xu wrote:
+static int intel_pmu_create_lbr_event(struct kvm_vcpu *vcpu)
+{
+ÂÂÂ struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+ÂÂÂ struct perf_event *event;
+
+ÂÂÂ /*
+ÂÂÂÂ * The perf_event_attr is constructed in the minimum efficient way:
+ÂÂÂÂ * - set 'pinned = true' to make it task pinned so that if another
+ÂÂÂÂ *ÂÂ cpu pinned event reclaims LBR, the event->oncpu will be set to -1;
+ÂÂÂÂ *
+ÂÂÂÂ * - set 'sample_type = PERF_SAMPLE_BRANCH_STACK' and
+ÂÂÂÂ *ÂÂ 'exclude_host = true' to mark it as a guest LBR event which
+ÂÂÂÂ *ÂÂ indicates host perf to schedule it without but a fake counter,
+ÂÂÂÂ *ÂÂ check is_guest_lbr_event() and intel_guest_event_constraints();
+ÂÂÂÂ *
+ÂÂÂÂ * - set 'branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
+ÂÂÂÂ *ÂÂ PERF_SAMPLE_BRANCH_USER' to configure it to use callstack mode,
+ÂÂÂÂ *ÂÂ which allocs 'ctx->task_ctx_data' and request host perf subsystem
+ÂÂÂÂ *ÂÂ to save/restore guest LBR records during host context switches,
+ÂÂÂÂ *ÂÂ check branch_user_callstack() and intel_pmu_lbr_sched_task();
+ÂÂÂÂ */
+ÂÂÂ struct perf_event_attr attr = {
+ÂÂÂÂÂÂÂ .type = PERF_TYPE_RAW,
This is not right; this needs a .config
Now we know the default value .config = 0 for attr is not acceptable.
And I suppose that is why you need that horrible:
needs_guest_lbr_without_counter() thing to begin with.
Do you suggest to use event->attr.config check to replace
"needs_branch_stack(event) && is_kernel_event(event) &&
event->attr.exclude_host" check for guest LBR event ?
Please allocate yourself an event from the pseudo event range:
event==0x00. Currently we only have umask==3 for Fixed2 and umask==4
for Fixed3, given you claim 58, which is effectively Fixed25,
umask==0x1a might be appropriate.
OK, I assume that adding one more field ".config = 0x1a00" is
efficient enough for perf_event_attr to allocate guest LBR events.
Also, I suppose we need to claim 0x0000 as an error, so that other
people won't try this again.
Does the following fix address your concern on this ?
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 2405926e2dba..32d2a3f8c51f 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -498,6 +498,9 @@ int x86_pmu_max_precise(void)
Âint x86_pmu_hw_config(struct perf_event *event)
Â{
+ÂÂÂÂÂÂ if (!unlikely(event->attr.config & X86_ARCH_EVENT_MASK))
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return -EINVAL;
+
ÂÂÂÂÂÂÂ if (event->attr.precise_ip) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ int precise = x86_pmu_max_precise();
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 2e6c59308344..bdba87a6f0af 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -47,6 +47,8 @@
ÂÂÂÂÂÂÂ (ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32))
Â#define INTEL_ARCH_EVENT_MASKÂ \
ÂÂÂÂÂÂÂ (ARCH_PERFMON_EVENTSEL_UMASK | ARCH_PERFMON_EVENTSEL_EVENT)
+#define X86_ARCH_EVENT_MASKÂÂÂ \
+ÂÂÂÂÂÂ (ARCH_PERFMON_EVENTSEL_UMASK | ARCH_PERFMON_EVENTSEL_EVENT)
Â#define AMD64_L3_SLICE_SHIFTÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 48
Â#define AMD64_L3_SLICE_MASK
+ÂÂÂÂÂÂÂ .size = sizeof(attr),
+ÂÂÂÂÂÂÂ .pinned = true,
+ÂÂÂÂÂÂÂ .exclude_host = true,
+ÂÂÂÂÂÂÂ .sample_type = PERF_SAMPLE_BRANCH_STACK,
+ÂÂÂÂÂÂÂ .branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ PERF_SAMPLE_BRANCH_USER,
+ÂÂÂ };
+
+ÂÂÂ if (unlikely(pmu->lbr_event))
+ÂÂÂÂÂÂÂ return 0;
+
+ÂÂÂ event = perf_event_create_kernel_counter(&attr, -1,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ current, NULL, NULL);
+ÂÂÂ if (IS_ERR(event)) {
+ÂÂÂÂÂÂÂ pr_debug_ratelimited("%s: failed %ld\n",
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ __func__, PTR_ERR(event));
+ÂÂÂÂÂÂÂ return -ENOENT;
+ÂÂÂ }
+ÂÂÂ pmu->lbr_event = event;
+ÂÂÂ pmu->event_count++;
+ÂÂÂ return 0;
+}
Also, what happens if you fail programming due to a conflicting cpu
event? That pinned doesn't guarantee you'll get the event, it just means
you'll error instead of getting RR.
I didn't find any code checking the event state.
Error instead of RR is expected.
If the KVM fails programming due to a conflicting cpu event
the LBR registers will not be passthrough to the guest,
and KVM would return zero for any guest LBR records accesses
until the next attempt to program the guest LBR event.
Every time before cpu enters the non-root mode where irq is
disabled, the "event-> oncpu! =-1" check will be applied.
(more details in the comment around intel_pmu_availability_check())
The guests administer is supposed to know the result of guest
LBR records is inaccurate if someone is using LBR to record
guest or hypervisor on the host side.
Is this acceptable to youï
If there is anything needs to be improved, please let me know.
Thanks,
Like Xu