Re: [PATCH] sched/core: Drop spinlocks on contention iff kernel is preemptible

From: Sean Christopherson
Date: Mon May 13 2024 - 11:34:09 EST


On Mon, May 13, 2024, Paolo Bonzini wrote:
> On 1/10/24 22:47, Sean Christopherson wrote:
> > Use preempt_model_preemptible() to detect a preemptible kernel when
> > deciding whether or not to reschedule in order to drop a contended
> > spinlock or rwlock. Because PREEMPT_DYNAMIC selects PREEMPTION, kernels
> > built with PREEMPT_DYNAMIC=y will yield contended locks even if the live
> > preemption model is "none" or "voluntary". In short, make kernels with
> > dynamically selected models behave the same as kernels with statically
> > selected models.
>
> Peter, looks like this patch fell through the cracks. Could this be applied
> for 6.10?
>
> There is a slightly confusing line in the commit message below, so that it
> reads more like an RFC; but the patch fixes a CONFIG_PREEMPT_DYNAMIC
> regression wrt static preemption models and has no functional change for
> !CONFIG_PREEMPT_DYNAMIC.
>
> > Somewhat counter-intuitively, NOT yielding a lock can provide better
> > latency for the relevant tasks/processes. E.g. KVM x86's mmu_lock, a
> > rwlock, is often contended between an invalidation event (takes mmu_lock
> > for write) and a vCPU servicing a guest page fault (takes mmu_lock for
> > read). For _some_ setups, letting the invalidation task complete even
> > if there is mmu_lock contention provides lower latency for *all* tasks,
> > i.e. the invalidation completes sooner *and* the vCPU services the guest
> > page fault sooner.
> >
> > But even KVM's mmu_lock behavior isn't uniform, e.g. the "best" behavior
> > can vary depending on the host VMM, the guest workload, the number of
> > vCPUs, the number of pCPUs in the host, why there is lock contention, etc.
> >
> > In other words, simply deleting the CONFIG_PREEMPTION guard (or doing the
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> This should be "deleting the preempt_model_preemptible() guard" given that
> the patch does delete CONFIG_PREEMPTION, and only leaves
> preempt_model_preemptible() in place.

Note, this version won't apply cleanly, v2[*] handles the code movement and still
applies on Linus' tree.

[*] https://lore.kernel.org/all/20240312193911.1796717-1-seanjc@xxxxxxxxxx