On Thu, 22 Sep 2016, Waiman Long wrote:
1us sleep is going to add another syscall and therefor scheduling, so what?The TO futexes are unfair as can be seen from the min/max thread times listedLocking was done mostly by lock stealing. This is where most of theHow does the lock latency distribution of all this look like and how fair
performance benefit comes from, not optimistic spinning.
is the whole thing?
above. It took the fastest thread 0.07s to complete all the locking
operations, whereas the slowest one needed 2.65s. However, the situation
reverses when I changed the critical section to a 1us sleep. In this case,
Or did you just extend the critical section busy time?
there will be no optimistic spinning. The performance results for 100k locking^^^^ ????
operations were listed below.
wait-wake futex PI futex TO futex
--------------- -------- --------
max time 0.06s 9.32s 4.76s
min time 5.59s 9.36s 5.62sSo the benefit of these new fangled futexes is only there for extreme short
average time 3.25s 9.35s 5.41s
In this case, the TO futexes are fairer but perform worse than the wait-wake
futexes. That is because the lock handoff mechanism limit the amount of lock
stealing in the TO futexes while the wait-wake futexes have no such
restriction. When I disabled lock handoff, the TO futexes would then perform
similar to the wait-wake futexes.
critical sections and a gazillion of threads fighting for the same futex,
I really wonder how the average programmer should pick the right flavour,
not to talk about any useful decision for something like glibc to pick the