Re: [Problem] Cache line starvation

From: Catalin Marinas
Date: Wed Oct 03 2018 - 03:52:04 EST


On Fri, 21 Sep 2018 at 13:22, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Fri, Sep 21, 2018 at 02:02:26PM +0200, Sebastian Andrzej Siewior wrote:
> > We reproducibly observe cache line starvation on a Core2Duo E6850 (2
> > cores), a i5-6400 SKL (4 cores) and on a NXP LS2044A ARM Cortex-A72 (4
> > cores).
> >
> > The problem can be triggered with a v4.9-RT kernel by starting
>
> > Daniel reported that disabling ticket locks on 4.4 makes the problem go
> > away, but he hasn't run a long time test yet and as we saw with 4.14 it can
> > take quite a while.
>
> On 4.4 and 4.9 ARM64 still uses ticket locks. So I'm very interested to
> know if the ticket locks on x86 really fix or just make it harder.
>
> I've been looking at qspinlock in the light of this and there is indeed
> room for improvement. The ticket lock certainly is much simpler.

FWIW, in the qspinlock TLA+ model [1], if I replace the
atomic_fetch_or() model with a try_cmpxchg loop, it violates the
liveness properties with only 2 CPUs as one keeps locking/unlocking,
hence changing the lock value, while the other repeatedly fails the
cmpxchg. Your latest qspinlock patches seem to address this (couldn't
get it to fail but the model is only sequentially consistent). Not
sure that's what Sebastian is seeing but without your proposed
qspinlock changes, ticket spinlocks may be a better bet for RT.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/kernel-tla.git/tree/qspinlock.tla

--
Catalin