Re: [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes

From: Thomas Gleixner
Date: Thu Sep 22 2016 - 16:41:20 EST

On Thu, 22 Sep 2016, Waiman Long wrote:
> BTW, my initial attempt for the new futex was to use the same workflow as the
> PI futexes, but use mutex which has optimistic spinning instead of rt_mutex.
> That version can double the throughput compared with PI futexes but still far
> short of what can be achieved with wait-wake futex. Looking at the performance
> figures from the patch:
> wait-wake futex PI futex TO futex
> --------------- -------- --------
> max time 3.49s 50.91s 2.65s
> min time 3.24s 50.84s 0.07s
> average time 3.41s 50.90s 1.84s
> sys time 7m22.4s 55.73s 2m32.9s

That's really interesting. Do you have any explanation for this massive
system time differences?

> lock count 3,090,294 9,999,813 698,318
> unlock count 3,268,896 9,999,814 134
> The problem with a PI futexes like version is that almost all the lock/unlock
> operations were done in the kernel which added overhead and latency. Now
> looking at the numbers for the TO futexes, less than 1/10 of the lock
> operations were done in the kernel, the number of unlock was insignificant.
> Locking was done mostly by lock stealing. This is where most of the
> performance benefit comes from, not optimistic spinning.

How does the lock latency distribution of all this look like and how fair
is the whole thing?

> This is also the reason that a lock handoff mechanism is implemented to
> prevent lock starvation which is likely to happen without one.

Where is that lock handoff mechanism?