[PATCH v2 3/7] sched,perf,kvm: Fix preemption condition

From: Peter Zijlstra
Date: Fri Jun 11 2021 - 04:34:49 EST


When ran from the sched-out path (preempt_notifier or perf_event),
p->state is irrelevant to determine preemption. You can get preempted
with !task_is_running() just fine.

The right indicator for preemption is if the task is still on the
runqueue in the sched-out path.

Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Acked-by: Mark Rutland <mark.rutland@xxxxxxx>
---
kernel/events/core.c | 7 +++----
virt/kvm/kvm_main.c | 2 +-
2 files changed, 4 insertions(+), 5 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8568,13 +8568,12 @@ static void perf_event_switch(struct tas
},
};

- if (!sched_in && task->state == TASK_RUNNING)
+ if (!sched_in && task->on_rq) {
switch_event.event_id.header.misc |=
PERF_RECORD_MISC_SWITCH_OUT_PREEMPT;
+ }

- perf_iterate_sb(perf_event_switch_output,
- &switch_event,
- NULL);
+ perf_iterate_sb(perf_event_switch_output, &switch_event, NULL);
}

/*
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4869,7 +4869,7 @@ static void kvm_sched_out(struct preempt
{
struct kvm_vcpu *vcpu = preempt_notifier_to_vcpu(pn);

- if (current->state == TASK_RUNNING) {
+ if (current->on_rq) {
WRITE_ONCE(vcpu->preempted, true);
WRITE_ONCE(vcpu->ready, true);
}