[tip: sched/core] sched/core: Drop spinlocks on contention iff kernel is preemptible

From: tip-bot2 for Sean Christopherson
Date: Thu Jun 06 2024 - 13:45:21 EST

Next message: tip-bot2 for Sean Christopherson: "[tip: sched/core] sched/core: Move preempt_model_*() helpers from sched.h to preempt.h"
Previous message: Reinette Chatre: "[PATCH V7 2/2] KVM: selftests: Add test for configure of x86 APIC bus frequency"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

The following commit has been merged into the sched/core branch of tip:

Commit-ID: c793a62823d1ce8f70d9cfc7803e3ea436277cda
Gitweb: https://git.kernel.org/tip/c793a62823d1ce8f70d9cfc7803e3ea436277cda
Author: Sean Christopherson <seanjc@xxxxxxxxxx>
AuthorDate: Mon, 27 May 2024 17:34:48 -07:00
Committer: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
CommitterDate: Wed, 05 Jun 2024 16:52:36 +02:00

sched/core: Drop spinlocks on contention iff kernel is preemptible

Use preempt_model_preemptible() to detect a preemptible kernel when
deciding whether or not to reschedule in order to drop a contended
spinlock or rwlock. Because PREEMPT_DYNAMIC selects PREEMPTION, kernels
built with PREEMPT_DYNAMIC=y will yield contended locks even if the live
preemption model is "none" or "voluntary". In short, make kernels with
dynamically selected models behave the same as kernels with statically
selected models.

Somewhat counter-intuitively, NOT yielding a lock can provide better
latency for the relevant tasks/processes. E.g. KVM x86's mmu_lock, a
rwlock, is often contended between an invalidation event (takes mmu_lock
for write) and a vCPU servicing a guest page fault (takes mmu_lock for
read). For _some_ setups, letting the invalidation task complete even
if there is mmu_lock contention provides lower latency for *all* tasks,
i.e. the invalidation completes sooner *and* the vCPU services the guest
page fault sooner.

But even KVM's mmu_lock behavior isn't uniform, e.g. the "best" behavior
can vary depending on the host VMM, the guest workload, the number of
vCPUs, the number of pCPUs in the host, why there is lock contention, etc.

In other words, simply deleting the CONFIG_PREEMPTION guard (or doing the
opposite and removing contention yielding entirely) needs to come with a
big pile of data proving that changing the status quo is a net positive.

Opportunistically document this side effect of preempt=full, as yielding
contended spinlocks can have significant, user-visible impact.

Fixes: c597bfddc9e9 ("sched: Provide Kconfig support for default dynamic preempt mode")
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Reviewed-by: Ankur Arora <ankur.a.arora@xxxxxxxxxx>
Reviewed-by: Chen Yu <yu.c.chen@xxxxxxxxx>
Link: https://lore.kernel.org/kvm/ef81ff36-64bb-4cfe-ae9b-e3acf47bff24@xxxxxxxxxxx
---
Documentation/admin-guide/kernel-parameters.txt | 4 +++-
include/linux/spinlock.h | 14 ++++++--------
2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 500cfa7..555e6b5 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4752,7 +4752,9 @@
none - Limited to cond_resched() calls
voluntary - Limited to cond_resched() and might_sleep() calls
full - Any section that isn't explicitly preempt disabled
- can be preempted anytime.
+ can be preempted anytime. Tasks will also yield
+ contended spinlocks (if the critical section isn't
+ explicitly preempt disabled beyond the lock itself).

print-fatal-signals=
[KNL] debug: print fatal signals
diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h
index 3fcd20d..63dd8cf 100644
--- a/include/linux/spinlock.h
+++ b/include/linux/spinlock.h
@@ -462,11 +462,10 @@ static __always_inline int spin_is_contended(spinlock_t *lock)
*/
static inline int spin_needbreak(spinlock_t *lock)
{
-#ifdef CONFIG_PREEMPTION
+ if (!preempt_model_preemptible())
+ return 0;
+
return spin_is_contended(lock);
-#else
- return 0;
-#endif
}

/*
@@ -479,11 +478,10 @@ static inline int spin_needbreak(spinlock_t *lock)
*/
static inline int rwlock_needbreak(rwlock_t *lock)
{
-#ifdef CONFIG_PREEMPTION
+ if (!preempt_model_preemptible())
+ return 0;
+
return rwlock_is_contended(lock);
-#else
- return 0;
-#endif
}

/*

Next message: tip-bot2 for Sean Christopherson: "[tip: sched/core] sched/core: Move preempt_model_*() helpers from sched.h to preempt.h"
Previous message: Reinette Chatre: "[PATCH V7 2/2] KVM: selftests: Add test for configure of x86 APIC bus frequency"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]