Re: [patch 2/2] timekeeping: Always check for negative motion

From: Guenter Roeck
Date: Fri Nov 29 2024 - 11:09:28 EST


On 11/29/24 04:16, Thomas Gleixner wrote:
On Thu, Nov 28 2024 at 07:30, Guenter Roeck wrote:
On 11/28/24 06:51, Thomas Gleixner wrote:
[ 19.090000] ###### now: b5ac62 last: d447e38 mask: ffffff return: 712e2a
[ 19.110000] ###### now: b9ebc3 last: db506d0 mask: ffffff return: 4e4f3
[ 19.120000] ###### now: bb842f last: db8d760 mask: ffffff return: 2accf
[ 19.160000] ###### now: c43f2e last: dbabfa8 mask: ffffff return: 97f86

'last' advances beyond 'mask', and after that the result is always bad. The call to
clocksource_delta() is from timekeeping_advance().

This does not make any sense. The bits above the mask in cycle_last are
irrelevant:

delta = (now - last) & mask;

[ 3.350000] ###### now: 6c4f last: fe6a84 mask: ffffff return: 201cb <---
[ 3.360000] ###### now: 40427 last: 10052cc mask: ffffff return: 3b15b <---

0x40427 - 0x10052cc = 0xffffffffff03b15b
0xffffffffff03b15b & 0xffffff = 0x3b15b

That's not any different than before. The only difference is that the
return value is checked:

return delta & ~(mask >> 1) ? 0 : delta;

But clearly none of the resulting deltas (after masking) has bit 23
set. So the function can't return 0, right?

Coming back to my earlier assumption vs. the max idle time. Here is a
long idle sleep:

[ 18.500000] ###### now: 45b0a2 last: d1c7050 mask: ffffff return: 294052
[ 19.090000] ###### now: b5ac62 last: d447e38 mask: ffffff return: 712e2a

The cycle interval is 125000 clock cycles per tick. That's a HZ=100
kernel, so the nominal clock frequency is 12.5 MHz.

0x712e2a/12.5e6 = 0.593391s

which is close to the 597268854ns max_idle_ns value.

That's about 0.0776978s away from the point where the delta becomes >
mask/2. So there is a valid concern vs. these long idle sleeps on
machines with a small counter width.

But none of this explains the problems you are observing.

Can you instrument clocksource_delta() to print the values only when the
negative motion detection triggers?


With added WARN_ONCE() after the problem is seen for the first time:

[ 13.860000] ##### now=0xd60127 last=0x85170f4 mask=0xffffff ret=0x849033
[ 13.860000] ------------[ cut here ]------------
[ 13.860000] WARNING: CPU: 0 PID: 0 at kernel/time/timekeeping_internal.h:44 timekeeping_advance+0x844/0x9d0
[ 13.860000] clocksource_delta() time going backward: now=0xd60127 last=0x85170f4 mask=0xffffff ret=0x849033
[ 13.860000] Modules linked in:
[ 13.860000] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-10689-g7af08b57bcb9-dirty #1
[ 13.860000] Hardware name: Generic DT based system
[ 13.860000] Call trace:
[ 13.860000] unwind_backtrace from show_stack+0x10/0x14
[ 13.860000] show_stack from dump_stack_lvl+0x64/0x88
[ 13.860000] dump_stack_lvl from __warn+0x7c/0x1ac
[ 13.860000] __warn from warn_slowpath_fmt+0x130/0x1a4
[ 13.860000] warn_slowpath_fmt from timekeeping_advance+0x844/0x9d0
[ 13.860000] timekeeping_advance from update_wall_time+0xc/0x1c
[ 13.860000] update_wall_time from tick_irq_enter+0x6c/0xb0
[ 13.860000] tick_irq_enter from generic_handle_arch_irq+0xc/0x44
[ 13.860000] generic_handle_arch_irq from __irq_svc+0x88/0xb0
[ 13.860000] Exception stack(0xc2201f28 to 0xc2201f70)
[ 13.860000] 1f20: 00000001 2d639000 c220da80 00000001 c220da80 c2208f14
[ 13.860000] 1f40: c24af000 c2208f54 00000000 00000000 00000097 00000000 c2208f14 c2201f78
[ 13.860000] 1f60: c1568504 c1568488 20000013 ffffffff
[ 13.860000] __irq_svc from default_idle_call+0x1c/0x1bc
[ 13.860000] default_idle_call from do_idle+0x200/0x288
[ 13.860000] do_idle from cpu_startup_entry+0x28/0x2c
[ 13.860000] cpu_startup_entry from kernel_init+0x0/0x12c
[ 13.860000] kernel_init from start_kernel+0x6a8/0x6c4
[ 13.860000] ---[ end trace 0000000000000000 ]---
[ 13.870000] ##### now=0xd747fa last=0x85170f4 mask=0xffffff ret=0x85d706
[ 13.880000] ##### now=0xda5bf3 last=0x85170f4 mask=0xffffff ret=0x88eaff
[ 13.890000] ##### now=0xdb1868 last=0x85170f4 mask=0xffffff ret=0x89a774
[ 13.940000] ##### now=0xe541ba last=0x85170f4 mask=0xffffff ret=0x93d0c6
[ 13.950000] ##### now=0xe63e66 last=0x85170f4 mask=0xffffff ret=0x94cd72
[ 13.950000] platform iio-hwmon: deferred probe pending: iio_hwmon: Failed to get channels
[ 13.960000] ##### now=0xe863a8 last=0x85170f4 mask=0xffffff ret=0x96f2b4
[ 13.970000] ##### now=0xeba3a3 last=0x85170f4 mask=0xffffff ret=0x9a32af
[ 13.980000] ##### now=0xec1406 last=0x85170f4 mask=0xffffff ret=0x9aa312
[ 13.990000] ##### now=0xef85a4 last=0x85170f4 mask=0xffffff ret=0x9e14b0
[ 14.000000] ##### now=0xf0f07b last=0x85170f4 mask=0xffffff ret=0x9f7f87
[ 14.010000] ##### now=0xf1bff6 last=0x85170f4 mask=0xffffff ret=0xa04f02
[ 14.150000] ##### now=0xc609c last=0x85170f4 mask=0xffffff ret=0xbaefa8
[ 14.360000] ##### now=0x357f2d last=0x85170f4 mask=0xffffff ret=0xe40e39
[ 14.410000] ##### now=0x3e0a3a last=0x85170f4 mask=0xffffff ret=0xec9946
[ 14.420000] ##### now=0x3fed36 last=0x85170f4 mask=0xffffff ret=0xee7c42
[ 14.430000] ##### now=0x42b58a last=0x85170f4 mask=0xffffff ret=0xf14496
[ 14.440000] ##### now=0x44ba17 last=0x85170f4 mask=0xffffff ret=0xf34923
[ 14.450000] ##### now=0x46635e last=0x85170f4 mask=0xffffff ret=0xf4f26a
[ 14.480000] ##### now=0x4bd58e last=0x85170f4 mask=0xffffff ret=0xfa649a
[ 14.490000] ##### now=0x4d3b0c last=0x85170f4 mask=0xffffff ret=0xfbca18
[ 14.510000] ##### now=0x510ef4 last=0x85170f4 mask=0xffffff ret=0xff9e00

With more context (printing for each call to clocksource_delta() as well as the
actual return value):

[ 14.020000] ##### now=0x165aa2 last=0x913b49f mask=0xffffff ret=0x2a603 -> 0x2a603
[ 14.030000] ##### now=0x18aa49 last=0x9159ce7 mask=0xffffff ret=0x30d62 -> 0x30d62
[ 14.040000] ##### now=0x1a2971 last=0x917852f mask=0xffffff ret=0x2a442 -> 0x2a442
[ 14.050000] ##### now=0x1c78d5 last=0x9196d77 mask=0xffffff ret=0x30b5e -> 0x30b5e
[ 14.060000] ##### now=0x1dfe88 last=0x91b55bf mask=0xffffff ret=0x2a8c9 -> 0x2a8c9
[ 14.090000] ##### now=0x236d57 last=0x91d3e07 mask=0xffffff ret=0x62f50 -> 0x62f50
[ 14.180000] ##### now=0x34aaa3 last=0x922f6df mask=0xffffff ret=0x11b3c4 -> 0x11b3c4
[ 14.890000] ##### now=0xbd0aad last=0x9342167 mask=0xffffff ret=0x88e946 -> 0x0
[ 14.890000] ------------[ cut here ]------------
[ 14.890000] WARNING: CPU: 0 PID: 0 at kernel/time/timekeeping_internal.h:44 clocksource_delta+0x108/0x118
[ 14.890000] clocksource_delta() time going backward: now=0xbd0aad last=0x9342167 mask=0xffffff ret=0x88e946
[ 14.890000] Modules linked in:
[ 14.890000] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-10689-g7af08b57bcb9-dirty #1
[ 14.890000] Hardware name: Generic DT based system
[ 14.890000] Call trace:
[ 14.890000] unwind_backtrace from show_stack+0x10/0x14
[ 14.890000] show_stack from dump_stack_lvl+0x64/0x88
[ 14.890000] dump_stack_lvl from __warn+0x7c/0x1ac
[ 14.890000] __warn from warn_slowpath_fmt+0x130/0x1a4
[ 14.890000] warn_slowpath_fmt from clocksource_delta+0x108/0x118
[ 14.890000] clocksource_delta from timekeeping_advance+0x6c/0x8e4
[ 14.890000] timekeeping_advance from update_wall_time+0xc/0x1c
[ 14.890000] update_wall_time from tick_irq_enter+0x6c/0xb0
[ 14.890000] tick_irq_enter from generic_handle_arch_irq+0xc/0x44
[ 14.890000] generic_handle_arch_irq from __irq_svc+0x88/0xb0
[ 14.890000] Exception stack(0xc2201f28 to 0xc2201f70)
[ 14.890000] 1f20: 00000001 2d639000 c220da80 00000001 c220da80 c2208f14
[ 14.890000] 1f40: c24af000 c2208f54 00000000 00000000 00000091 00000000 c2208f14 c2201f78
[ 14.890000] 1f60: c15684c4 c1568448 20000013 ffffffff
[ 14.890000] __irq_svc from default_idle_call+0x1c/0x1bc
[ 14.890000] default_idle_call from do_idle+0x200/0x288
[ 14.890000] do_idle from cpu_startup_entry+0x28/0x2c
[ 14.890000] cpu_startup_entry from kernel_init+0x0/0x12c
[ 14.890000] kernel_init from start_kernel+0x6a8/0x6c4
[ 14.890000] ---[ end trace 0000000000000000 ]---
[ 14.900000] ##### now=0xbdfc5c last=0x9342167 mask=0xffffff ret=0x89daf5 -> 0x0
[ 14.950000] ##### now=0xc8827a last=0x9342167 mask=0xffffff ret=0x946113 -> 0x0
[ 15.060000] ##### now=0xdc3907 last=0x9342167 mask=0xffffff ret=0xa817a0 -> 0x0
[ 15.110000] ##### now=0xe6b484 last=0x9342167 mask=0xffffff ret=0xb2931d -> 0x0
Sent SIGKILL to all processes
Requesting system reboot
[ 15.120000] ##### now=0xe7fd9a last=0x9342167 mask=0xffffff ret=0xb3dc33 -> 0x0
[ 15.170000] ##### now=0xf277d5 last=0x9342167 mask=0xffffff ret=0xbe566e -> 0x0
[ 15.260000] ##### now=0x26c10 last=0x9342167 mask=0xffffff ret=0xce4aa9 -> 0x0
[ 15.310000] ##### now=0xce799 last=0x9342167 mask=0xffffff ret=0xd8c632 -> 0x0

So, yes, it looks like the problem is seen after a large delta, and persists from there.

Guenter