* Waiman Long<Waiman.Long@xxxxxx> wrote:Thank for the clarification.
Yes - but I'm talking about spin/poll-waiters.Furthermore, since you are seeing this effect so profoundly, have youThe mutex code in the slowpath has already put the waiters into a sleep queue
considered using another approach, such as queueing all the poll-waiters in
some fashion?
That would optimize your workload additionally: removing the 'stampede' of
trylock attempts when an unlock happens - only a single wait-poller would get
the lock.
and wait up only one at a time.
[...] However, there are 2 additional source of mutex lockers besides those inEven the 1st patch seems to do that, it limits the impact of spin-loopers, right?
the sleep queue:
1. New tasks trying to acquire the mutex and currently in the fast path.
2. Mutex spinners (CONFIG_MUTEX_SPIN_ON_OWNER on) who are spinning
on the owner field and ready to acquire the mutex once the owner
field change.
The 2nd and 3rd patches are my attempts to limit the second types of mutex
lockers.
I'm fine with patch #1 [your numbers are proof enough that it helps while the lowYes, I think we can implement some kind of ticketing system for the spinners. Similar to patch #2, we have to add a new field to the mutex structure for the head/tail ticketing numbers and hence will add a little more contention to the same mutex cacheline when the ticket numbers are updated. I can think of an easy way to do that without increasing the size of the mutex. I will try it out to see what performance impact it will have.
client count effect seems to be in the noise] - the questions that seem open to me
are:
- Could the approach in patch #1 be further improved by an additional patch that
adds queueing to the _spinners_ in some fashion - like ticket spin locks try to
do in essence? Not queue the blocked waiters (they are already queued), but the
active spinners. This would have additional benefits, especially with a high
CPU count and a high NUMA factor, by removing the stampede effect as owners get
switched.
- Why does patch #2 have an effect? (it shouldn't at first glance) It has aI think these two patches can have some performance impact because it allows the CPUs to be used for some other tasks that are waiting for CPU instead of allowing the CPUs idle waiting for the mutex to be acquired. It isn't a big problem if only 1 or 2 threads are spinning, but it could be if most the CPUs in the system are wasting time spinning for the mutex. That begs the question that even if we implement a ticket queuing system, will it make sense to limit the number of spinners to just a few, say 3?
non-trivial cost, it increases the size of 'struct mutex' by 8 bytes, which
structure is embedded in numerous kernel data structures. When doing
comparisons I'd suggest comparing it not to just vanilla, but to a patch that
only extends the struct mutex data structure (and changes no code) - this
allows the isolation of cache layout change effects.
- Patch #3 is rather ugly - and my hope would be that if spinners are queued in
some fashion it becomes unnecessary.