Re: [GIT pull] timer updates for 4.9

From: Linus Torvalds
Date: Sun Oct 23 2016 - 18:40:32 EST


On Sat, Oct 22, 2016 at 5:02 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> please pull the latest timers-urgent-for-linus git tree from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers-urgent-for-linus

Hmm. I just hit something that looks very much like a timer bug.

I'm working on cleaning up the KERN_CONT handling of printk.c, and in
the process I'm adding a timer that flushes incomplete lines to the
logs. The code is pretty simple, and it basically just does

static void deferred_cont_flush(void)
{
static DEFINE_TIMER(timer, flush_timer, 0, 0);

if (!cont.len)
return;
mod_timer(&timer, jiffies + HZ/10);
}

which is then called whenever we have a incomplete line (the
"flush_timer()" function will just flush the pending line to the
console and to the logs.

It worked fine in testing, but at shutdown I just got this:

do_trap: 165 callbacks suppressed
traps: gnome-shell[1339] trap int3 ip:7f465aace15b sp:7fffc54f1670 error:0

followed by

general protection fault: 0000 [#1] SMP
Modules linked in: fuse xt_CHECKSUM ipt_MASQUERADE
nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns
nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ebtable_broute bridge stp llc ebtable_nat ip6tabl$
wmi pinctrl_intel acpi_als kfifo_buf tpm_tis tpm_tis_core
industrialio acpi_pad tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc
dm_crypt i915 crct10dif_pclmul crc32_pclmul crc32c_intel i2c_algo_bit
drm_kms_helper syscopyarea se$
CPU: 7 PID: 1339 Comm: gnome-shell Tainted: G O
4.9.0-rc1-00244-g0c2b6dc4fd4f-dirty #6
Hardware name: System manufacturer System Product Name/Z170-K,
BIOS 1803 05/06/2016
task: ffff95c443bcd940 task.stack: ffffa6ce82d6c000
RIP: 0010:[<ffffffffb10f9df1>] [<ffffffffb10f9df1>]
detach_if_pending+0x41/0xd0
RSP: 0000:ffffa6ce82d6fd08 EFLAGS: 00010086
RAX: dead000000000200 RBX: 0000000100089285 RCX: 0000000000000000
RDX: ffffffffb1c99580 RSI: ffff95c476ccf140 RDI: ffffffffb1c50920
RBP: ffffa6ce82d6fd20 R08: 000000000000004a R09: 0000000000000000
R10: 000000000000000f R11: ffffffffb1f0912c R12: ffffffffb1c50920
R13: 0000000000000000 R14: ffff95c476ccf140 R15: ffffffffb1c50920
FS: 00007f46639adac0(0000) GS:ffff95c476dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fea54725010 CR3: 0000000410fa3000 CR4: 00000000003406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
0000000100089285 0000000000000052 0000000000000051 ffffa6ce82d6fd90
ffffffffb10fc4e0 ffffffffb1c50948 0000000100089222 00000000cbeae39b
ffffffffb1f0912b 0000000000000246 00000000cbeae39b ffffffffb1a074af
Call Trace:
[<ffffffffb10fc4e0>] mod_timer+0xd0/0x3b0
[<ffffffffb10e6670>] deferred_cont_flush.part.12+0x20/0x30
[<ffffffffb10e7139>] console_unlock+0x489/0x4a0
[<ffffffffb10e092e>] ? down_trylock+0x2e/0x40
[<ffffffffb10e7443>] vprintk_emit+0x2f3/0x480
[<ffffffffb10e77a9>] vprintk_default+0x29/0x40
[<ffffffffb11a1a3b>] printk+0x57/0x73
[<ffffffffb13c7521>] ? ___ratelimit+0x91/0xd0
[<ffffffffb101ed76>] do_trap+0x116/0x140
[<ffffffffb101f2bf>] do_int3+0xdf/0x110
[<ffffffffb17a17cb>] int3+0x5b/0x60
Code: 41 55 41 54 53 48 83 3f 00 0f 84 81 00 00 00 41 89 d5 49 89
fc 0f 1f 44 00 00 49 8b 04 24 49 8b 54 24 08 48 85 c0 48 89 02 74 04
<48> 89 50 08 45 84 ed 74 09 49 c7 44 24 08 00 00 00 00 48 b8 00
RIP [<ffffffffb10f9df1>] detach_if_pending+0x41/0xd0
RSP <ffffa6ce82d6fd08>
---[ end trace e4a7055c685d5e1e ]---

(there was another GP fault at the same address immediately
afterwards, and then the machine was dead).

The code disassembly at the faulting instruction is

0: 48 89 50 08 mov %rdx,0x8(%rax)
4: 45 84 ed test %r13b,%r13b
7: 74 09 je 0x12

and obviously %rax is bogus. In fact, %rax has the value
"dead000000000200", which is just LIST_POISON2.

Now, normally, I'd just say "my code is obviously buggy crap", and I
did spend a bit of time on that theory. But quite frankly, that timer
really _is_ local to that one function, and the absolutely only thing
that is ever done on that timer is that "mod_timer()" call.

So I don't think it's my new code. What triggers this is do_trap() doing two

pr_info("%s[%d] trap %s ip:%lx sp:%lx error:%lx",
tsk->comm, tsk->pid, str,
regs->ip, regs->sp, error_code);
print_vma_addr(" in ", regs->ip);
pr_cont("\n");

which does two printk's without '\n' in close succession, so the first
one will turn on the timer, and then the second one will update it
again very soon afterwards.

So I started looking at mod_timer() and detach_if_pending(), but I
don't see anything obvious. I can certainly find the code: the oops
clearly happens in detach_timer():

__hlist_del(entry);
if (clear_pending)
entry->pprev = NULL;
entry->next = LIST_POISON2;

and the "test %r13b,%r13b" instruction that follows the GP
insutrction is the test of "clear_pending", and the access that faults
is the

if (next)
next->pprev = pprev;

in __hlist_del(), where "next" is that %rax LIST_POISON2 value.

So it smells like "entry" was simply detached _twice_. The second time
it hit that LIST_POISON2 poisoning and died.

Some locking problem in the timer handling?

Without knowing anythign at all, I'd be inclined to blame f00c0afdfa62
("timers: Implement optimization for same expiry time in mod_timer()")
which seems to do unlocked games with mod_timer().

But I may be full of crap. I literally just looked at "what has
changed recently in this area", on the assumption that timers have
historically been robust.

The fact that this happened not during my testing (I had test cases
that did similar things), but at shutdown, makes me also wonder if
maybe some CPU hot-unplug thing might have been involved. Concurrent
timer migration from a CPU that was going down? Although honestly, it
doesn't seem like that, this happens when gnome-shell is exiting, and
it's simply not that late in the shutdown process yet.

I don't know. It still bothers me that this happens with new code that
I was just testing out, and my gut feel is that "timers are so core
and tested from other code that it can't *possibly* be bug in
mod_timer()". But the code really is *so* simple: literally a single
static timer that is only _ever_ touched by "mod_timer()" and
absolutely nothing else. So even if everything else in my KERN_CONT
cleanup was completely buggy crap, I'm not seeing how it could screw
up the timer.

Ideas?

Linus