Re: [Problem] Cache line starvation
From: Peter Zijlstra
Date: Wed Oct 03 2018 - 04:23:29 EST
On Wed, Oct 03, 2018 at 08:51:50AM +0100, Catalin Marinas wrote:
> On Fri, 21 Sep 2018 at 13:22, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > On Fri, Sep 21, 2018 at 02:02:26PM +0200, Sebastian Andrzej Siewior wrote:
> > > We reproducibly observe cache line starvation on a Core2Duo E6850 (2
> > > cores), a i5-6400 SKL (4 cores) and on a NXP LS2044A ARM Cortex-A72 (4
> > > cores).
> > >
> > > The problem can be triggered with a v4.9-RT kernel by starting
> >
> > > Daniel reported that disabling ticket locks on 4.4 makes the problem go
> > > away, but he hasn't run a long time test yet and as we saw with 4.14 it can
> > > take quite a while.
> >
> > On 4.4 and 4.9 ARM64 still uses ticket locks. So I'm very interested to
> > know if the ticket locks on x86 really fix or just make it harder.
> >
> > I've been looking at qspinlock in the light of this and there is indeed
> > room for improvement. The ticket lock certainly is much simpler.
>
> FWIW, in the qspinlock TLA+ model [1], if I replace the
> atomic_fetch_or() model with a try_cmpxchg loop, it violates the
> liveness properties with only 2 CPUs as one keeps locking/unlocking,
> hence changing the lock value, while the other repeatedly fails the
> cmpxchg. Your latest qspinlock patches seem to address this (couldn't
> get it to fail but the model is only sequentially consistent). Not
> sure that's what Sebastian is seeing but without your proposed
> qspinlock changes, ticket spinlocks may be a better bet for RT.
Right, and agreed. I did raise that point when you initially proposed
that fetch_or() for liveliness.