[PATCH v2] tick/broadcast: Ensure the timer device of hrtimer broadcast is enabled

From: Yu Liao
Date: Thu Jul 11 2024 - 08:52:16 EST


It was found that running the LTP hotplug stress test on a aarch64
machine could produce rcu_sched stall warnings.

The issue is the following:

CPU1 (owns the broadcast hrtimer) CPU2

tick_broadcast_enter()
//shutdown local timer device
broadcast_shutdown_local()
...
tick_broadcast_exit()
clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT)
//timer device remains shutdown
cpumask_set_cpu(cpu, tick_broadcast_force_mask)

initiates offlining of CPU1
take_cpu_down()
/*
* CPU1 shuts down and does not
* send broadcast IPI anymore
*/
takedown_cpu()
hotplug_cpu__broadcast_tick_pull()
//move broadcast hrtimer to this CPU
clockevents_program_event()
bc_set_next()
hrtimer_start()
/*
* timer device remains shutdown,
* because only the first expiring
* timer will trigger clockevent
* device reprogramming
*/

What happens is that CPU2 exits broadcast mode with force bit set, then
we don't reprogram the local timer device and expect to handle the
expired event by broadcast mechanism, but this can not be done because
CPU1 is offlined by CPU2. We switch the clockevent to ONESHOT state,
but some device like arm arch timer don't implement set_state_oneshot
handler, so the switch operation does nothing but change the value
of dev->state_use_accessors.

After CPU2 takes over the broadcast duty, CPU2 is also unable to handle
broadcasting by itself because the local timer device is still shutdown,
due to only the first expiring timer will trigger clockevent device
reprogramming. The worst result is all CPUs are stucked.

Fix this issue by reprogramming the local timer device if the clockevent
device of the CPU that owns broadcast timer is shutdown. As we owns
broadcast timer, clear the force mask bit.

Signed-off-by: Yu Liao <liaoyu15@xxxxxxxxxx>
---
Changes in v2:
- Move the check to hotplug_cpu__broadcast_tick_pull()
- Remove the conditon 'expires >= next_event'
- Link to v1: https://lore.kernel.org/all/20231218025844.55675-1-liaoyu15@xxxxxxxxxx/

kernel/time/tick-broadcast.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 771d1e040303..0edff1e46b7c 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -1141,6 +1141,7 @@ void tick_broadcast_switch_to_oneshot(void)
#ifdef CONFIG_HOTPLUG_CPU
void hotplug_cpu__broadcast_tick_pull(int deadcpu)
{
+ struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
struct clock_event_device *bc;
unsigned long flags;

@@ -1148,6 +1149,21 @@ void hotplug_cpu__broadcast_tick_pull(int deadcpu)
bc = tick_broadcast_device.evtdev;

if (bc && broadcast_needs_cpu(bc, deadcpu)) {
+ /*
+ * If the broadcast force bit is set, then we haven't
+ * reprogrammed local timer device, so it remains shutdowned.
+ * clockevents_program_event() will start a hrtimer when
+ * the broadcast device is based on hrtimer, and only the
+ * first expiring timer will trigger clockevent device
+ * reprogramming.
+ *
+ * Reprogram the cpu local timer device to avoid it being shut down.
+ */
+ if (tick_check_broadcast_expired()) {
+ cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
+ tick_program_event(td->evtdev->next_event, 1);
+ }
+
/* This moves the broadcast assignment to this CPU: */
clockevents_program_event(bc, bc->next_event, 1);
}
--
2.33.0