[PATCH] timers/migration: Fix livelock in tmigr_handle_remote_up()
From: Amit Matityahu
Date: Wed Jun 03 2026 - 13:04:29 EST
tmigr_handle_remote_cpu() skips timer_expire_remote() when cpu ==
smp_processor_id(), assuming the local softirq path already handled
this CPU's timers.
This assumption breaks when jiffies advances between
run_timer_base(BASE_GLOBAL) and tmigr_handle_remote() in the same
softirq invocation - a timer expires after the wheel ran but before
the hierarchy snapshot is taken.
The stranded timer is never collected,
fetch_next_timer_interrupt_remote() keeps reporting it as expired,
and the event is re-queued with expires == now on each iteration.
The goto-again loop spins indefinitely.
Fix by calling timer_expire_remote() unconditionally.
__run_timer_base() already returns early when there is nothing to
expire, making this a no-op in the common case.
Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model")
Cc: stable@xxxxxxxxxxxxxxx
Reported-by: Alon Kariv <alonka@xxxxxxxxxx>
Cc: Jonathan Chocron <jonnyc@xxxxxxxxxx>
Cc: Akram Baransi <abaransi@xxxxxxxxxx>
Cc: David Woodhouse <dwmw@xxxxxxxxxxxx>
Signed-off-by: Amit Matityahu <amitmat@xxxxxxxxxx>
---
Questions for maintainers:
1. What was the original rationale for the cpu != smp_processor_id()
check? There is no code comment, commit message explanation or anything
in the original patch's email discussion as to why
timer_expire_remote() is skipped for the local CPU.
2. There seems to be a design tension where a CPU can have timers
visible in the migration hierarchy while simultaneously running its
own local softirq. Is the expectation that run_timer_base() always
drains everything before tmigr_handle_remote() sees it, or should
the remote path handle local-CPU timers as a fallback?
kernel/time/timer_migration.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index 1d0d3a4058d5..298c34c942ae 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -978,8 +978,7 @@ static void tmigr_handle_remote_cpu(unsigned int cpu, u64 now,
/* Drop the lock to allow the remote CPU to exit idle */
raw_spin_unlock_irq(&tmc->lock);
- if (cpu != smp_processor_id())
- timer_expire_remote(cpu);
+ timer_expire_remote(cpu);
/*
* Lock ordering needs to be preserved - timer_base locks before tmigr
base-commit: e43ffb69e0438cddd72aaa30898b4dc446f664f8
--
2.47.3