Re: [PATCH v2 1/2] tick: Remove unreasonable detached state set in tick_shutdown()

From: Bibo Mao
Date: Thu Sep 04 2025 - 22:06:38 EST




On 2025/9/4 下午11:57, Thomas Gleixner wrote:
On Thu, Sep 04 2025 at 15:17, Bibo Mao wrote:
Function clockevents_switch_state() will check whether it has already
switched to specified state, do nothing if it has.

In function tick_shutdown(), it will set detached state at first and
call clockevents_switch_state() in clockevents_exchange_device(). The
function clockevents_switch_state() will do nothing since it is already
detached state. So the tick timer device will not be shutdown when CPU
is offline. In guest VM system, timer interrupt will prevent vCPU to
sleep if vCPU is hot removed.

Here remove state set before calling clockevents_exchange_device(),
its state will be set in function clockevents_switch_state() if it
succeeds to do so.

This explanation is incomplete. tick_shutdown() did this because it was
originally invoked on a life CPU and not on the outgoing CPU.

That got changed in

3b1596a21fbf ("clockevents: Shutdown and unregister current clockevents at CPUHP_AP_TICK_DYING")

which is the actual root cause.

The pile of 'Fixes:' below is just enumerating the subsequent problems.

Fixes: bf9a001fb8e4 ("clocksource/drivers/timer-tegra: Remove clockevents shutdown call on offlining")
Fixes: cd165ce8314f ("clocksource/drivers/qcom: Remove clockevents shutdown call on offlining")
Fixes: 30f8c70a85bc ("clocksource/drivers/armada-370-xp: Remove clockevents shutdown call on offlining")
Fixes: ba23b6c7f974 ("clocksource/drivers/exynos_mct: Remove clockevents shutdown call on offlining")
Fixes: 15b810e0496e ("clocksource/drivers/arm_global_timer: Remove clockevents shutdown call on offlining")
Fixes: 78b5c2ca5f27 ("clocksource/drivers/arm_arch_timer: Remove clockevents shutdown call on offlining")
Fixes: 900053d9eedf ("ARM: smp_twd: Remove clockevents shutdown call on offlining")

Signed-off-by: Bibo Mao <maobibo@xxxxxxxxxxx>
Reviewed-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
---
kernel/time/tick-common.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index 9a3859443c04..eb9b777f5492 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -424,11 +424,6 @@ void tick_shutdown(unsigned int cpu)
td->mode = TICKDEV_MODE_PERIODIC;
if (dev) {
- /*
- * Prevent that the clock events layer tries to call
- * the set mode function!
- */
- clockevent_set_state(dev, CLOCK_EVT_STATE_DETACHED);
clockevents_exchange_device(dev, NULL);
dev->event_handler = clockevents_handle_noop;
td->evtdev = NULL;

Can this pretty please cleanup the misleading comment above
tick_shutdown() as well?

* Shutdown an event device on a given cpu:
*
* This is called on a life CPU, when a CPU is dead. So we cannot
* access the hardware device itself.
* We just set the mode and remove it from the lists.

That should have been removed or updated with 3b1596a21fbf too, no?

With that the cpu argument is not longer useful either, because this is
now guaranteed to be invoked on the outgoing CPU, no?

It is not easy with my poor English to spell out the comments :(
How about the patch like this?

Function clockevents_switch_state() will check whether it has already
switched to specified state, do nothing if it has.

In function tick_shutdown(), it will set detached state at first and
call clockevents_switch_state() in clockevents_exchange_device(). The
function clockevents_switch_state() will do nothing since it is already
detached state. So the tick timer device will not be shutdown when CPU
is offline.

Function tick_shutdown() did this because it was originally invoked
on a life CPU and not on the outgoing CPU. Now this function is called
on the outgoing CPU, the hardware device can be accessed.

Here remove state set before calling clockevents_exchange_device(), its
state will be set in function clockevents_switch_state() if it succeeds
to do so.

Fixes: 3b1596a21fbf ("clockevents: Shutdown and unregister current clockevents at CPUHP_AP_TICK_DYING")


/*
- * Shutdown an event device on a given cpu:
+ * Shutdown an event device on the outgoing CPU:
*
- * This is called on a life CPU, when a CPU is dead. So we cannot
- * access the hardware device itself.
- * We just set the mode and remove it from the lists.
+ * Called by the dying CPU during teardown, with clockevents_lock held
+ * and interrupts disabled.
*/
-void tick_shutdown(unsigned int cpu)
+void tick_shutdown(void)
{
- struct tick_device *td = &per_cpu(tick_cpu_device, cpu);
+ struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
struct clock_event_device *dev = td->evtdev;

td->mode = TICKDEV_MODE_PERIODIC;
if (dev) {
- /*
- * Prevent that the clock events layer tries to call
- * the set mode function!
- */
- clockevent_set_state(dev, CLOCK_EVT_STATE_DETACHED);
clockevents_exchange_device(dev, NULL);
dev->event_handler = clockevents_handle_noop;
td->evtdev = NULL;

Regards
Bibo Mao

Thanks,

tglx