Timer refuses to expire

From: Paul E. McKenney
Date: Fri Dec 01 2017 - 13:25:38 EST


Hello, Anna-Maria,

It turned out that one set of problems was due to NO_HZ_FULL issues,
which are addressed with a pair of patches. However, there is still one
case of a timer refusing to expire in rcutorture's TREE01 test scenario.
The takes from two to four hours to trigger, and the most recent one
gives the following diagnostic (see patch at the very end of this email):

[13166.127458] schedule_timeout: Waylayed timer base->clk: 0x100c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 000000000000001000000000001080000000000000040000000002000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000

The message appears any time a timer for less than five jiffies takes
more than eight seconds to expire. In all cases, this expiry did not
happen naturally, but rather via an unsolicited wakeup from the RCU CPU
stall code. If I am interpreting this message correctly, base->clk
has advanced past this timer, and the timer is correctly flagged in
base->pending_map. This seems like part of the problem because the
timer's timeout was only three jiffies. However, base->clk has not
advanced for more than 20 seconds, which seems like another problem.

What additional diagnostics would be helpful? I can capture data
at the beginning of the schedule_timeout in the timer_list structure,
and of course can print more information at the time of the wakeup.

------------------------------------------------------------------------

[13144.096640] rcu_bh-torture:torture_onoff task: onlining 3
[13144.105877] smpboot: Booting Node 0 Processor 3 APIC 0x3
[13144.266716] rcu_bh-torture:torture_onoff task: onlined 3
[13147.296610] rcu_bh-torture:torture_onoff task: offlining 6
[13147.616632] rcu_bh-torture: rtc: ffffffffb6dd9ee0 ver: 221994 tfle: 0 rta: 221995 rtaf: 0 rtf: 221981 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 16382958 onoff: 1824/1824:1826/1827 49,354:25,605 264279:246163 (HZ=1000) barrier: 87633/87633:0 cbflood: 12833
[13147.623449] rcu_bh-torture: Reader Pipe: 163134304753 676327 0 0 0 0 0 0 0 0 0
[13147.624931] rcu_bh-torture: Reader Batch: 163051334134 83646946 0 0 0 0 0 0 0 0 0
[13147.627426] rcu_bh-torture: Free-Block Circulation: 221994 221993 221992 221988 221987 221986 221985 221984 221983 221981 0
[13162.976784] rcu_bh-torture: rtc: ffffffffb6dda690 ver: 222249 tfle: 0 rta: 222249 rtaf: 0 rtf: 222240 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 16402175 onoff: 1824/1824:1826/1827 49,354:25,605 264279:246163 (HZ=1000) barrier: 87737/87737:0 cbflood: 12849
[13162.981280] rcu_bh-torture: Reader Pipe: 163340804815 677169 0 0 0 0 0 0 0 0 0
[13162.983225] rcu_bh-torture: Reader Batch: 163257732604 83749380 0 0 0 0 0 0 0 0 0
[13162.985163] rcu_bh-torture: Free-Block Circulation: 222248 222248 222247 222246 222245 222244 222243 222242 222241 222240 0
[13166.058496] INFO: rcu_sched detected stalls on CPUs/tasks:
[13166.060682] (detected by 34, t=21004 jiffies, g=20232, c=20231, q=2)
[13166.061863] All QSes seen, last rcu_sched kthread activity 21004 (4307833356-4307812352), jiffies_till_next_fqs=3, root ->qsmask 0x0
[13166.065346] rcu_torture_rea R running task 15016 835 2 0x80000000
[13166.066840] Call Trace:
[13166.068532] <IRQ>
[13166.069337] sched_show_task+0x105/0x130
[13166.070351] rcu_check_callbacks+0x9dd/0x9f0
[13166.071418] ? tick_sched_do_timer+0x40/0x40
[13166.072423] update_process_times+0x23/0x50
[13166.073491] tick_sched_handle+0x30/0x50
[13166.074418] tick_sched_timer+0x2f/0x70
[13166.075307] __hrtimer_run_queues+0x8d/0x250
[13166.076302] hrtimer_interrupt+0xad/0x200
[13166.077390] smp_apic_timer_interrupt+0x58/0x150
[13166.078533] apic_timer_interrupt+0x89/0x90
[13166.079498] </IRQ>
[13166.080024] RIP: 0010:delay_tsc+0x2e/0xb0
[13166.080958] RSP: 0018:ffffbb5281733e30 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff11
[13166.084359] RAX: 0000000080000200 RBX: 000017e66223c0c9 RCX: 0000000000000201
[13166.085994] RDX: 0000000080000201 RSI: ffffffffb51f1e29 RDI: 00000000ffffffff
[13166.087579] RBP: 000017e662222311 R08: 000000000000000b R09: 0000000000000101
[13166.089183] R10: ffffe543c043f340 R11: ffffbb5281733e58 R12: 000000000006169f
[13166.092235] R13: 0000000000000022 R14: 0000000000069fa8 R15: ffffffffb6dd9b20
[13166.095399] rcu_read_delay+0x104/0x110
[13166.097195] rcu_torture_reader+0x1a4/0x320
[13166.099012] ? rcu_torture_reader+0x320/0x320
[13166.100939] ? kthread+0xf0/0x130
[13166.102446] kthread+0xf0/0x130
[13166.103841] ? rcu_torture_timer_cb+0x10/0x10
[13166.105892] ? kthread_destroy_worker+0x40/0x40
[13166.107804] ret_from_fork+0x1f/0x30
[13166.109304] rcu_sched kthread starved for 21052 jiffies! g20232 c20231 f0x0 RCU_GP_DONE_GPS(2) ->state=0x2 ->cpu=3
[13166.113636] rcu_sched D14992 9 2 0x80000000
[13166.116050] Call Trace:
[13166.117201] ? __schedule+0x33c/0x6f0
[13166.118823] ? preempt_count_add+0x51/0x90
[13166.120168] schedule+0x37/0x90
[13166.121002] schedule_timeout+0x159/0x450
[13166.121969] ? __next_timer_interrupt+0xc0/0xc0
[13166.123007] rcu_gp_kthread+0x226/0x1110
[13166.123909] kthread+0xf0/0x130
[13166.124702] ? rcu_barrier+0x10/0x10
[13166.125548] ? kthread_destroy_worker+0x40/0x40
[13166.126575] ret_from_fork+0x1f/0x30
[13166.127458] schedule_timeout: Waylayed timer base->clk: 0x100c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 000000000000001000000000001080000000000000040000000002000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000

------------------------------------------------------------------------

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index ffebcf878fba..23af27461d8c 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1755,8 +1755,13 @@ static void process_timeout(struct timer_list *t)
*/
signed long __sched schedule_timeout(signed long timeout)
{
+ struct timer_base *base;
struct process_timer timer;
unsigned long expire;
+ unsigned long flags;
+ unsigned long i;
+ unsigned int idx, idx_now;
+ unsigned long j;

switch (timeout)
{
@@ -1793,6 +1798,17 @@ signed long __sched schedule_timeout(signed long timeout)
timer_setup_on_stack(&timer.timer, process_timeout, 0);
__mod_timer(&timer.timer, expire, 0);
schedule();
+ j = jiffies;
+ if (timeout < 5 && time_after(j, expire + 8 * HZ) && timer_pending(&timer.timer)) {
+ base = lock_timer_base(&timer.timer, &flags);
+ idx = timer_get_idx(&timer.timer);
+ idx_now = calc_wheel_index(j, base->clk);
+ raw_spin_unlock_irqrestore(&base->lock, flags);
+ pr_info("%s: Waylayed timer base->clk: %#lx jiffies: %#lx base->next_expiry: %#lx timer->flags: %#x timer->expires %#lx idx: %x idx_now: %x base->pending_map ", __func__, base->clk, j, base->next_expiry, timer.timer.flags, timer.timer.expires, idx, idx_now);
+ for (i = 0; i < WHEEL_SIZE / sizeof(base->pending_map[0]) / 8; i++)
+ pr_cont("%016lx", base->pending_map[i]);
+ pr_cont("\n");
+ }
del_singleshot_timer_sync(&timer.timer);

/* Remove the timer from the object tracker */