On Wed, 8 Jan 2025 at 21:57, Waiman Long <llong@xxxxxxxxxx> wrote:I know. The paravirt part is the most difficult. It took me over a year to work on the paravirt part of qspinlock to get it right and merged upstream.
On 1/7/25 8:59 AM, Kumar Kartikeya Dwivedi wrote:We would need to make algorithmic changes to paravirt versions, which
We ripped out PV and virtualization related bits from rqspinlock in anvirt_spin_lock() doesn't scale well. It is for hypervisors that don't
earlier commit, however, a fair lock performs poorly within a virtual
machine when the lock holder is preempted. As such, retain the
virt_spin_lock fallback to test and set lock, but with timeout and
deadlock detection.
We don't integrate support for CONFIG_PARAVIRT_SPINLOCKS yet, as that
requires more involved algorithmic changes and introduces more
complexity. It can be done when the need arises in the future.
support PV qspinlock yet. Now rqspinlock() will be in this category.
would be too much for this series, so I didn't go there.
I wonder if we should provide an option to disable rqspinlock and fallThat unfortunately won't work, because rqspinlock operates essentially
back to the regular qspinlock with strict BPF locking semantics.
like a trylock, where it is allowed to fail and callers must handle
errors accordingly. Some of the users in BPF (e.g. in patch 17) remove
their per-cpu nesting counts to rely on AA deadlock detection of
rqspinlock, which would cause a deadlock if we transparently replace
it with qspinlock as a fallback.
Another question that I have is about PREEMPT_RT kernel which cannotI think rqspinlock better maps to the raw spin lock variants, which
tolerate any locking stall. That will probably require disabling
rqspinlock if CONFIG_PREEMPT_RT is enabled.
stays as a spin lock on RT kernels, and as you see in patch 17 and 18,
BPF maps were already using the raw spin lock variants. To avoid
stalling, we perform deadlock checks immediately when we enter the
slow path, so for the cases where we rely upon rqspinlock to diagnose
and report an error, we'll recover quickly. If we still hit the
timeout it is probably a different problem / bug anyway (and would
have caused a kernel hang otherwise).