Re: [PATCH v2 0/9] sched/kvm: Semantics-aware vCPU scheduling for oversubscribed KVM

From: Christian Borntraeger

Date: Thu Mar 26 2026 - 10:57:50 EST

Am 19.12.25 um 04:53 schrieb Wanpeng Li:

From: Wanpeng Li <wanpengli@xxxxxxxxxxx>

This series addresses long-standing yield_to() inefficiencies in
virtualized environments through two complementary mechanisms: a vCPU
debooster in the scheduler and IPI-aware directed yield in KVM.

Problem Statement
-----------------

In overcommitted virtualization scenarios, vCPUs frequently spin on locks
held by other vCPUs that are not currently running. The kernel's
paravirtual spinlock support detects these situations and calls yield_to()
to boost the lock holder, allowing it to run and release the lock.

However, the current implementation has two critical limitations:

1. Scheduler-side limitation:

yield_to_task_fair() relies solely on set_next_buddy() to provide
preference to the target vCPU. This buddy mechanism only offers
immediate, transient preference. Once the buddy hint expires (typically
after one scheduling decision), the yielding vCPU may preempt the target
again, especially in nested cgroup hierarchies where vruntime domains
differ.

This creates a ping-pong effect: the lock holder runs briefly, gets
preempted before completing critical sections, and the yielding vCPU
spins again, triggering another futile yield_to() cycle. The overhead
accumulates rapidly in workloads with high lock contention.

Wanpeng,

late but not forgotten.

So Richie Buturla gave this a try on s390 with some variations but still
without cgroup support (next step).
The numbers look very promising (diag 9c is our yieldto hypercall). With
super high overcommitment the benefit shrinks again, but results are still
positive. We are probably running into other limits.

2:1 Overcommit Ratio:
diag9c calls: 225,804,073 → 213,913,266 (-5.3%)
Dbench thrpt (per-run mean): +1.3%
Dbench thrpt (per-run median): +0.8%
Dbench thrpt (total across runs): +1.3%
Dbench thrpt (avg/VM): +1.3%

4:1:
diag9c calls: 833,455,152 → 556,597,627 (-33.2%)
Dbench thrpt (per-run mean): +7.2%
Dbench thrpt (per-run median): +8.5%
Dbench thrpt (total across runs): +7.2%
Dbench thrpt (avg/VM): +7.2%

6:1:
diag9c calls: 967,501,378 → 737,178,419 (-23.8%)
Dbench thrpt (per-run mean): +5.1%
Dbench thrpt (per-run median): +4.8%
Dbench thrpt (total across runs): +5.1%
Dbench thrpt (avg/VM): +5.1%

8:1:
diag9c calls: 872,165,596 → 653,481,530 (-25.1%)
Dbench thrpt (per-run mean): +11.5%
Dbench thrpt (per-run median): +11.4%
Dbench thrpt (total across runs): +11.5%
Dbench thrpt (avg/VM): +11.5%

9:1:
diag9c calls: 809,384,976 → 587,597,163 (-27.4%)
Dbench thrpt (per-run mean): +4.5%
Dbench thrpt (per-run median): +4.0%
Dbench thrpt (total across runs): +4.5%
Dbench thrpt (avg/VM): +4.5%

10:1:
diag9c calls: 711,772,971 → 477,448,374 (-32.9%)
Dbench thrpt (per-run mean): +3.6%
Dbench thrpt (per-run median): +1.6%
Dbench thrpt (total across runs): +3.6%
Dbench thrpt (avg/VM): +3.6%