Re: __queue_work oops.

From: Dave Jones
Date: Mon Feb 27 2017 - 17:10:00 EST


On Mon, Feb 27, 2017 at 01:39:25PM -0500, Tejun Heo wrote:

> On Mon, Feb 27, 2017 at 12:14:39PM -0500, Dave Jones wrote:
> > Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.10.0-think+ #9
> > task: ffff88017f105440 task.stack: ffffc90000094000
> > RIP: 0010:__queue_work+0x2d/0x700
> > RSP: 0018:ffff880507c03df8 EFLAGS: 00010046
> > RAX: 0000000000000082 RBX: 0000000000000101 RCX: 0000000000000002
> > RDX: ffff88047bf07c98 RSI: 0000000000000000 RDI: 0000000000000000
> > RBP: ffff880507c03e30 R08: 0000000000000001 R09: ffffffff8294bf68
> > R10: ffff880507c03e58 R11: 0000000000000000 R12: ffff88047bf07ce8
> > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88047bf07c98
> > FS: 0000000000000000(0000) GS:ffff880507c00000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000000001c2 CR3: 0000000004e11000 CR4: 00000000001406e0
> > Call Trace:
> > <IRQ>
> > ? work_on_cpu+0xb0/0xb0
> > delayed_work_timer_fn+0x1e/0x20
> > call_timer_fn+0xbd/0x480
> ...
> > Code starting with the faulting instruction
> > ===========================================
> > 0: 41 f6 85 c2 01 00 00 testb $0x1,0x1c2(%r13)
> > 7: 01
> > 8: 0f 85 22 04 00 00 jne 0x430
> > e: 49 rex.WB
> > f: bc eb 83 b5 80 mov $0x80b583eb,%esp
> > 14: 46 rex.RX
> >
> > 0000000000003cf0 <__queue_work>:
> > {
> > 3cf0: e8 00 00 00 00 callq 3cf5 <__queue_work+0x5>
> > 3cf5: 55 push %rbp
> > 3cf6: 48 89 e5 mov %rsp,%rbp
> > 3cf9: 41 57 push %r15
> > 3cfb: 49 89 d7 mov %rdx,%r15
> > 3cfe: 41 56 push %r14
> > unsigned int req_cpu = cpu;
> > 3d00: 41 89 fe mov %edi,%r14d
> > {
> > 3d03: 41 55 push %r13
> > 3d05: 49 89 f5 mov %rsi,%r13
> > 3d08: 41 54 push %r12
> > 3d0a: 53 push %rbx
> > 3d0b: 48 83 ec 10 sub $0x10,%rsp
> > 3d0f: 89 7d d4 mov %edi,-0x2c(%rbp)
> > asm volatile("# __raw_save_flags\n\t"
> > 3d12: 9c pushfq
> > 3d13: 58 pop %rax
> > WARN_ON_ONCE(!irqs_disabled());
> > 3d14: f6 c4 02 test $0x2,%ah
> > 3d17: 0f 85 06 04 00 00 jne 4123 <__queue_work+0x433>
> > if (unlikely(wq->flags & __WQ_DRAINING) &&
> > 3d1d: 41 f6 85 c2 01 00 00 testb $0x1,0x1c2(%r13)
> >
> >
> > So we called __queue_work with a null wq.
>
> So, that's somebody calling queue_delayed_work[_on]() with a NULL wq
> and when the timeout expires the timer callback trying to queue
> against NULL. Hmm... the work function would be able to tell us who
> queued it but it isn't part of the information dumped here (would be
> 0x18(%rdx)).
>
> I'll add a sanity check on queue_delayed_work_on() so that we can
> catch it synchronously when it happens.

I dumped work->func, and found it pointed to smc_close_sock_put_work

<smc maintainer cc'd>

Dave