Re: [RFC] locking/mutex: Fix starvation of sleeping waiters

From: Imre Deak
Date: Tue Jul 19 2016 - 12:53:46 EST


On ma, 2016-07-18 at 10:47 -0700, Jason Low wrote:
> On Mon, 2016-07-18 at 19:15 +0200, Peter Zijlstra wrote:
> > On Mon, Jul 18, 2016 at 07:16:47PM +0300, Imre Deak wrote:
> > > Currently a thread sleeping on a mutex wait queue can be delayed
> > > indefinitely by other threads managing to steal the lock, that is
> > > acquiring the lock out-of-order before the sleepers. I noticed
> > > this via
> > > a testcase (see the Reference: below) where one CPU was unlocking
> > > /
> > > relocking a mutex in a tight loop while another CPU was delayed
> > > indefinitely trying to wake up and get the lock but losing out to
> > > the
> > > first CPU and going back to sleep:
> > >
> > > CPU0:ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂCPU1:
> > > mutex_lock->acquire
> > > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂmutex_lock->sleep
> > > mutex_unlock->wake CPU1
> > > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂwakeup
> > > mutex_lock->acquire
> > > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂtrylock fail->sleep
> > > mutex_unlock->wake CPU1
> > > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂwakeup
> > > mutex_lock->acquire
> > > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂtrylock fail->sleep
> > > ... ÂÂÂÂÂ...
> > >
> > > To fix this we can make sure that CPU1 makes progress by avoiding
> > > the
> > > fastpath locking, optimistic spinning and trylocking if there is
> > > any
> > > waiter on the list.ÂÂThe corresponding check can be done without
> > > holding
> > > wait_lock, since the goal is only to make sure sleepers make
> > > progress
> > > and not to guarantee that the locking will happen in FIFO order.
> >
> > I think we went over this before, that will also completely destroy
> > performance under a number of workloads.
>
> Yup, once a thread becomes a waiter, all other threads will need to
> follow suit, so this change would effectively disable optimistic
> spinning in some workloads.
>
> A few months ago, we worked on patches that allow the waiter to
> return
> to optimistic spinning to help reduce starvation. Longman sent out a
> version 3 patch set, and it sounded like we were fine with the
> concept.

Thanks, with v4 he just sent I couldn't trigger the above problem.

However this only works if mutex spinning is enabled, if it's disabled
I still hit the problem due to the other forms of lock stealing. So
could we prevent these if mutex spinning is anyway disabled?

--Imre