Re: [PATCH v4] hrtimers: Force migrate away hrtimers queued after CPUHP_AP_HRTIMERS_DYING
From: Walter Chang (張維哲)
Date: Wed Mar 26 2025 - 01:46:59 EST
On Tue, 2025-01-21 at 09:08 -0800, Paul E. McKenney wrote:
> On Sat, Jan 18, 2025 at 12:24:33AM +0100, Frederic Weisbecker wrote:
> > hrtimers are migrated away from the dying CPU to any online target
> > at
> > the CPUHP_AP_HRTIMERS_DYING stage in order not to delay bandwidth
> > timers
> > handling tasks involved in the CPU hotplug forward progress.
> >
> > However wake ups can still be performed by the outgoing CPU after
> > CPUHP_AP_HRTIMERS_DYING. Those can result again in bandwidth timers
> > being armed. Depending on several considerations (crystal ball
> > power management based election, earliest timer already enqueued,
> > timer
> > migration enabled or not), the target may eventually be the current
> > CPU even if offline. If that happens, the timer is eventually
> > ignored.
> >
> > The most notable example is RCU which had to deal with each and
> > every of
> > those wake-ups by deferring them to an online CPU, along with
> > related
> > workarounds:
> >
> > _ e787644caf76 (rcu: Defer RCU kthreads wakeup when CPU is dying)
> > _ 9139f93209d1 (rcu/nocb: Fix RT throttling hrtimer armed from
> > offline CPU)
> > _ f7345ccc62a4 (rcu/nocb: Fix rcuog wake-up from offline softirq)
> >
> > The problem isn't confined to RCU though as the stop machine
> > kthread
> > (which runs CPUHP_AP_HRTIMERS_DYING) reports its completion at the
> > end
> > of its work through cpu_stop_signal_done() and performs a wake up
> > that
> > eventually arms the deadline server timer:
> >
> > WARNING: CPU: 94 PID: 588 at kernel/time/hrtimer.c:1086
> > hrtimer_start_range_ns+0x289/0x2d0
> > CPU: 94 UID: 0 PID: 588 Comm: migration/94 Not tainted
> > Stopper: multi_cpu_stop+0x0/0x120 <-
> > stop_machine_cpuslocked+0x66/0xc0
> > RIP: 0010:hrtimer_start_range_ns+0x289/0x2d0
> > Call Trace:
> > <TASK>
> > ? hrtimer_start_range_ns
> > start_dl_timer
> > enqueue_dl_entity
> > dl_server_start
> > enqueue_task_fair
> > enqueue_task
> > ttwu_do_activate
> > try_to_wake_up
> > complete
> > cpu_stopper_thread
> > smpboot_thread_fn
> > kthread
> > ret_from_fork
> > ret_from_fork_asm
> > </TASK>
> >
> > Instead of providing yet another bandaid to work around the
> > situation,
> > fix it from hrtimers infrastructure instead: always migrate away a
> > timer to an online target whenever it is enqueued from an offline
> > CPU.
> >
> > This will also allow to revert all the above RCU disgraceful hacks.
> >
> > Reported-by: Vlad Poenaru <vlad.wing@xxxxxxxxx>
> > Reported-by: Usama Arif <usamaarif642@xxxxxxxxx>
> > Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from
> > outgoing CPU earlier")
> > Closes: 20241213203739.1519801-1-usamaarif642@xxxxxxxxx
> > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
>
> This passes over-holiday testing rcutorture, so, perhaps redundantly:
>
> Tested-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
Hi,
I encountered the same issue even after applying this patch.
Below are the details of the warning and call trace.
migration/3: ------------[ cut here ]------------
migration/3: WARNING: CPU: 3 PID: 42 at kernel/time/hrtimer.c:1125
enqueue_hrtimer+0x7c/0xec
migration/3: CPU: 3 UID: 0 PID: 42 Comm: migration/3 Tainted: G
OE 6.12.18-android16-0-g59cb5a849beb-4k #1
0b440e43fa7b24aaa3b7e6e5d2b938948e0cacdb
migration/3: Stopper: multi_cpu_stop+0x0/0x184 <-
stop_machine_cpuslocked+0xc0/0x15c
migration/3: Call trace:
migration/3: enqueue_hrtimer+0x7c/0xec
migration/3: hrtimer_start_range_ns+0x54c/0x7b4
migration/3: start_dl_timer+0x110/0x188
migration/3: enqueue_dl_entity+0x478/0x80c
migration/3: dl_server_start+0xf8/0x194
migration/3: enqueue_task_fair+0x6c4/0x924
migration/3: enqueue_task+0x70/0x3a4
migration/3: do_activate_task+0xcc/0x178
migration/3: ttwu_do_activate+0xac/0x2e0
migration/3: try_to_wake_up+0x73c/0xaf4
migration/3: wake_up_process+0x18/0x28
migration/3: kick_pool+0xc4/0x170
migration/3: __queue_work+0x44c/0x698
migration/3: delayed_work_timer_fn+0x20/0x30
migration/3: call_timer_fn+0x4c/0x1d0
migration/3: __run_timer_base+0x278/0x350
migration/3: run_timer_softirq+0x78/0x9c
migration/3: handle_softirqs+0x154/0x42c
migration/3: __do_softirq+0x14/0x20
migration/3: ____do_softirq+0x10/0x20
migration/3: call_on_irq_stack+0x3c/0x74
migration/3: do_softirq_own_stack+0x1c/0x2c
migration/3: __irq_exit_rcu+0x5c/0xd4
migration/3: irq_exit_rcu+0x10/0x1c
migration/3: el1_interrupt+0x38/0x58
migration/3: el1h_64_irq_handler+0x18/0x24
migration/3: el1h_64_irq+0x68/0x6c
migration/3: multi_cpu_stop+0x15c/0x184
migration/3: cpu_stopper_thread+0xf8/0x1b0
migration/3: smpboot_thread_fn+0x204/0x304
migration/3: kthread+0x110/0x1a4
migration/3: ret_from_fork+0x10/0x20
migration/3: ---[ end trace 0000000000000000 ]---
kworker/1:1: psci: CPU0 killed (polled 0 ms)