[PATCH v2] timekeeping: Allow runtime PM from change_clocksource()

From: Niklas Söderlund
Date: Thu Feb 11 2021 - 09:12:57 EST


The struct clocksource callbacks enable() and disable() are described as
a way to allow clock sources to enter a power save mode [1]. But using
runtime PM from these callbacks triggers a cyclic lockdep warning when
switching clock source using change_clocksource().

This change allows the new clocksource to be enabled() and the old one
to be disabled() without holding the timekeeper_lock. This solution is
modeled on timekeeping_resume() and timekeeping_suspend() where the
struct clocksource resume() and suspend() callbacks are called without
holding the timekeeper_lock.

Triggering and log of the deadlock warning,

# echo e60f0000.timer > /sys/devices/system/clocksource/clocksource0/current_clocksource
[ 118.081095] ======================================================
[ 118.087455] WARNING: possible circular locking dependency detected
[ 118.093821] 5.10.0-rc6-arm64-renesas-00002-ga0cf8106e584 #37 Not tainted
[ 118.100712] ------------------------------------------------------
[ 118.107069] migration/0/11 is trying to acquire lock:
[ 118.112269] ffff0000403ed220 (&dev->power.lock){-...}-{2:2}, at: __pm_runtime_resume+0x40/0x74
[ 118.121172]
[ 118.121172] but task is already holding lock:
[ 118.127170] ffff8000113c8f88 (tk_core.seq.seqcount){----}-{0:0}, at: multi_cpu_stop+0xa4/0x190
[ 118.136061]
[ 118.136061] which lock already depends on the new lock.
[ 118.136061]
[ 118.144461]
[ 118.144461] the existing dependency chain (in reverse order) is:
[ 118.152149]
[ 118.152149] -> #2 (tk_core.seq.seqcount){----}-{0:0}:
[ 118.158900] lock_acquire.part.0+0x120/0x330
[ 118.163834] lock_acquire+0x64/0x80
[ 118.167971] seqcount_lockdep_reader_access.constprop.0+0x74/0x100
[ 118.174865] ktime_get+0x28/0xa0
[ 118.178729] hrtimer_start_range_ns+0x210/0x2dc
[ 118.183934] generic_sched_clock_init+0x70/0x88
[ 118.189134] sched_clock_init+0x40/0x64
[ 118.193622] start_kernel+0x494/0x524
[ 118.197926]
[ 118.197926] -> #1 (hrtimer_bases.lock){-.-.}-{2:2}:
[ 118.204491] lock_acquire.part.0+0x120/0x330
[ 118.209424] lock_acquire+0x64/0x80
[ 118.213557] _raw_spin_lock_irqsave+0x7c/0xc4
[ 118.218579] hrtimer_start_range_ns+0x68/0x2dc
[ 118.223690] rpm_suspend+0x308/0x5dc
[ 118.227909] rpm_idle+0xc4/0x2a4
[ 118.231771] pm_runtime_work+0x98/0xc0
[ 118.236171] process_one_work+0x294/0x6f0
[ 118.240836] worker_thread+0x70/0x45c
[ 118.245143] kthread+0x154/0x160
[ 118.249007] ret_from_fork+0x10/0x20
[ 118.253222]
[ 118.253222] -> #0 (&dev->power.lock){-...}-{2:2}:
[ 118.259607] check_noncircular+0x128/0x140
[ 118.268774] __lock_acquire+0x13b0/0x204c
[ 118.277780] lock_acquire.part.0+0x120/0x330
[ 118.287001] lock_acquire+0x64/0x80
[ 118.295375] _raw_spin_lock_irqsave+0x7c/0xc4
[ 118.304623] __pm_runtime_resume+0x40/0x74
[ 118.313644] sh_cmt_start+0x1c4/0x260
[ 118.322275] sh_cmt_clocksource_enable+0x28/0x50
[ 118.331891] change_clocksource+0x9c/0x160
[ 118.340910] multi_cpu_stop+0xa4/0x190
[ 118.349522] cpu_stopper_thread+0x90/0x154
[ 118.358429] smpboot_thread_fn+0x244/0x270
[ 118.367265] kthread+0x154/0x160
[ 118.375158] ret_from_fork+0x10/0x20
[ 118.383284]
[ 118.383284] other info that might help us debug this:
[ 118.383284]
[ 118.402810] Chain exists of:
[ 118.402810] &dev->power.lock --> hrtimer_bases.lock --> tk_core.seq.seqcount
[ 118.402810]
[ 118.425597] Possible unsafe locking scenario:
[ 118.425597]
[ 118.439130] CPU0 CPU1
[ 118.447413] ---- ----
[ 118.455641] lock(tk_core.seq.seqcount);
[ 118.463335] lock(hrtimer_bases.lock);
[ 118.473507] lock(tk_core.seq.seqcount);
[ 118.483787] lock(&dev->power.lock);
[ 118.491120]
[ 118.491120] *** DEADLOCK ***
[ 118.491120]
[ 118.507666] 2 locks held by migration/0/11:
[ 118.515424] #0: ffff8000113c9278 (timekeeper_lock){-.-.}-{2:2}, at: change_clocksource+0x2c/0x160
[ 118.528257] #1: ffff8000113c8f88 (tk_core.seq.seqcount){----}-{0:0}, at: multi_cpu_stop+0xa4/0x190
[ 118.541248]
[ 118.541248] stack backtrace:
[ 118.553226] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.10.0-rc6-arm64-renesas-00002-ga0cf8106e584 #37
[ 118.566923] Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
[ 118.579051] Call trace:
[ 118.585649] dump_backtrace+0x0/0x190
[ 118.593505] show_stack+0x14/0x30
[ 118.601001] dump_stack+0xe8/0x130
[ 118.608567] print_circular_bug+0x1f0/0x200
[ 118.616930] check_noncircular+0x128/0x140
[ 118.625231] __lock_acquire+0x13b0/0x204c
[ 118.633451] lock_acquire.part.0+0x120/0x330
[ 118.641958] lock_acquire+0x64/0x80
[ 118.649630] _raw_spin_lock_irqsave+0x7c/0xc4
[ 118.658186] __pm_runtime_resume+0x40/0x74
[ 118.666483] sh_cmt_start+0x1c4/0x260
[ 118.674327] sh_cmt_clocksource_enable+0x28/0x50
[ 118.683170] change_clocksource+0x9c/0x160
[ 118.691460] multi_cpu_stop+0xa4/0x190
[ 118.699384] cpu_stopper_thread+0x90/0x154
[ 118.707672] smpboot_thread_fn+0x244/0x270
[ 118.715961] kthread+0x154/0x160
[ 118.723360] ret_from_fork+0x10/0x20
[ 118.731465] clocksource: Switched to clocksource e60f0000.timer

1. commit 4614e6adafa2c5e6 ("clocksource: add enable() and disable() callbacks")

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@xxxxxxxxxxxx>
---
kernel/time/timekeeping.c | 36 +++++++++++++++++++++++-------------
1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6aee5768c86ff7db..26ef4c9ef5c57081 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1427,35 +1427,45 @@ static void __timekeeping_set_tai_offset(struct timekeeper *tk, s32 tai_offset)
static int change_clocksource(void *data)
{
struct timekeeper *tk = &tk_core.timekeeper;
- struct clocksource *new, *old;
+ struct clocksource *new, *old = NULL;
unsigned long flags;
+ bool change = false;

new = (struct clocksource *) data;

- raw_spin_lock_irqsave(&timekeeper_lock, flags);
- write_seqcount_begin(&tk_core.seq);
-
- timekeeping_forward_now(tk);
/*
* If the cs is in module, get a module reference. Succeeds
* for built-in code (owner == NULL) as well.
*/
if (try_module_get(new->owner)) {
- if (!new->enable || new->enable(new) == 0) {
- old = tk->tkr_mono.clock;
- tk_setup_internals(tk, new);
- if (old->disable)
- old->disable(old);
- module_put(old->owner);
- } else {
+ if (!new->enable || new->enable(new) == 0)
+ change = true;
+ else
module_put(new->owner);
- }
}
+
+ raw_spin_lock_irqsave(&timekeeper_lock, flags);
+ write_seqcount_begin(&tk_core.seq);
+
+ timekeeping_forward_now(tk);
+
+ if (change) {
+ old = tk->tkr_mono.clock;
+ tk_setup_internals(tk, new);
+ }
+
timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET);

write_seqcount_end(&tk_core.seq);
raw_spin_unlock_irqrestore(&timekeeper_lock, flags);

+ if (old) {
+ if (old->disable)
+ old->disable(old);
+
+ module_put(old->owner);
+ }
+
return 0;
}

--
2.30.0