Re: [Question]: try to fix contention between expire_timers and try_to_del_timer_sync
From: Thomas Gleixner
Date: Wed Jul 26 2017 - 10:16:49 EST
On Wed, 26 Jul 2017, qiaozhou wrote:
Cc'ed ARM folks.
> I want to ask you for suggestions about how to fix one contention between
> expire_timers and try_to_del_timer_sync. Thanks in advance.
> The issue is a hard-lockup issue detected on our platform(arm64, one cluster
> with 4 a53, and the other with 4 a73). The sequence is as below:
> 1. core0 checks expired timers, and wakes up a process on a73, in step 1.3.
> 2. core4 starts to run the process and try to delete the timer in sync method,
> in step 2.1.
> 3. Before core0 can run to 1.4 and get the lock, core4 has already run from
> 2.1 to 2.6 in while loop. And in step 2.4, it fails since the timer is still
> the running timer.
>
> core0(a53): core4(a73):
> run_timer_softirq
> __run_timers
> spin_lock_irq(&base->lock)
> expire_timers()
> 1.1: base->running_timer = timer;
> 1.2: spin_unlock_irq(&base->lock);
> 1.3: call_timer_fn(timer, fn, data);
> schedule()
> 1.4: spin_lock_irq(&base->lock); del_timer_sync(&timer)
> 1.5: back to 1.1 in while loop
> 2.1: try_to_del_timer_sync()
> 2.2: get_timer_base()
> 2.3: spin_lock_irqsave(&base->lock, flags);
> 2.4: check "base->running_timer != timer"
> 2.5: spin_unlock_irqrestore(&base->lock, flags);
> 2.6:cpu_relax(); (back to 2.1 in while loop)
This is horribly formatted.
> Core0 runs @832MHz, and core4 runs @1.8GHz. A73 is also much more powerful in
> design than a53. So the actual running is that a73 keeps looping in spin_lock
> -> check running_timer -> spin_unlock -> spin_lock ->...., while a53 can't
> even succeed to complete one store exclusive instruction, before it can enter
> WFE to wait for lock release. It just keeps looping in +30 and+3c in below
> asm, due to stxr fails.
>
> So it deadloop, a53 can't jump out ldaxr/stxr loop, while a73 can't pass the
> running_timer check. Finally the hard-lockup is triggered.(BTW, the issue
> needs a long time to occur.)
>
> <_raw_spin_lock_irq+0x2c>: prfm pstl1strm, [x19]
> /<_raw_spin_lock_irq+0x30>: ldaxr w0, [x19]//
> //<_raw_spin_lock_irq+0x34>: add w1, w0, #0x10, lsl #12//
> //<_raw_spin_lock_irq+0x38>: stxr w2, w1, [x19]//
> //<_raw_spin_lock_irq+0x3c>: cbnz w2, 0xffffff80089f52e0
> <_raw_spin_lock_irq+0x30>/
> <_raw_spin_lock_irq+0x40>: eor w1, w0, w0, ror #16
> <_raw_spin_lock_irq+0x44>: cbz w1, 0xffffff80089f530c
> <_raw_spin_lock_irq+0x5c>
> <_raw_spin_lock_irq+0x48>: sevl
> <_raw_spin_lock_irq+0x4c>: wfe
> <_raw_spin_lock_irq+0x50>: ldaxrh w2, [x19]
> <_raw_spin_lock_irq+0x54>: eor w1, w2, w0, lsr #16
> <_raw_spin_lock_irq+0x58>: cbnz w1, 0xffffff80089f52fc
> <_raw_spin_lock_irq+0x4c>
>
> The loop on a53 only has 4 instructions, and loop on a73 has ~100
> instructions. Still a53 has no chance to store exclusive successfully. It may
> be related with ldaxr/stxr cost, core frequency, core number etc.
>
> I have no idea of fixing it in spinlock/unlock implement, so I try to fix it
> in timer driver. I prepared a raw patch, not sure it's the correct direction
> to solve this issue. Could you help to give some suggestions? Thanks.
>
> From fb8fbfeb8f9f92fdadd9920ce234fd433bc883e1 Mon Sep 17 00:00:00 2001
> From: Qiao Zhou <qiaozhou@xxxxxxxxxxxx>
> Date: Wed, 26 Jul 2017 20:30:33 +0800
> Subject: [PATCH] RFC: timers: try to fix contention between expire_timers and
> del_timer_sync
>
> try to fix contention between expire_timers and del_timer_sync by
> adding TIMER_WOKEN status, which means that the timer has already
> woken up corresponding process.
Timers do a lot more things than waking up a process.
Just because this happens in a particular case for you where a wakeup is
involved this does not mean, that this is generally true.
And that whole flag business is completely broken. There is a comment in
call_timer_fn() which says:
/*
* It is permissible to free the timer from inside the
* function that is called from it ....
So you _CANNOT_ touch timer after the function returned. It might be gone
already.
For that particular timer case we can clear base->running_timer w/o the
lock held (see patch below), but this kind of
lock -> test -> unlock -> retry
loops are all over the place in the kernel, so this is going to hurt you
sooner than later in some other place.
Thanks,
tglx
8<------------
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1301,10 +1301,12 @@ static void expire_timers(struct timer_b
if (timer->flags & TIMER_IRQSAFE) {
raw_spin_unlock(&base->lock);
call_timer_fn(timer, fn, data);
+ base->running_timer = NULL;
raw_spin_lock(&base->lock);
} else {
raw_spin_unlock_irq(&base->lock);
call_timer_fn(timer, fn, data);
+ base->running_timer = NULL;
raw_spin_lock_irq(&base->lock);
}
}