Re: WARNING in tracepoint_probe_register_prio

From: Jens Axboe
Date: Tue Oct 31 2017 - 18:30:50 EST


On 10/31/2017 04:21 PM, Steven Rostedt wrote:
> On Sun, 29 Oct 2017 14:24:01 -0700
> syzbot <bot+2acf619eef45c8f8b322a24d8e528eccbe6319bc@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
>
>> syzkaller has found reproducer for the following crash on
>> 36ef71cae353f88fd6e095e2aaa3e5953af1685d
>
> So this fuzzer triggers this.
>
>> git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>> C reproducer is attached
>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>> for information about syzkaller reproducers
>>
>>
>> WARNING: CPU: 0 PID: 2995 at kernel/tracepoint.c:210 tracepoint_add_func
>> kernel/tracepoint.c:210 [inline]
>
> Which is this:
>
> old = func_add(&tp_funcs, func, prio);
> if (IS_ERR(old)) {
> WARN_ON_ONCE(1);
> return PTR_ERR(old);
> }
>
> Which means that func_add() returned a warning.
>
> The possible warnings are:
>
> if (WARN_ON(!tp_func->func))
> return ERR_PTR(-EINVAL);
>
> if (old[nr_probes].func == tp_func->func &&
> old[nr_probes].data == tp_func->data)
> return ERR_PTR(-EEXIST);
>
> new = allocate_probes(nr_probes + 2);
> if (new == NULL)
> return ERR_PTR(-ENOMEM);
>
> Which means that either the tp_func->func didn't have a function. I
> doubt that from the back trace, it is just the registering of the
> tracepoints which should always have func set.
>
> I doubt we ran out of memory here. Possible, but unlikely (this was
> reproduced twice with the same path).
>
> Which leaves us with a tracepoint that was registered twice.
>
> There's nothing in register_trace_*() that protects it. The blktrace
> code needs to have some protection to know if it registered the
> tracepoints once, otherwise this will trigger.
>
>
>> WARNING: CPU: 0 PID: 2995 at kernel/tracepoint.c:210
>> tracepoint_probe_register_prio+0x397/0x9a0 kernel/tracepoint.c:283
>> Kernel panic - not syncing: panic_on_warn set ...
>
> It panics because "panic_on_warn" is set and we just did a warning.
>
>>
>> CPU: 0 PID: 2995 Comm: syzkaller857118 Not tainted
>> 4.14.0-rc5-next-20171018+ #36
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>> __dump_stack lib/dump_stack.c:16 [inline]
>> dump_stack+0x194/0x257 lib/dump_stack.c:52
>> panic+0x1e4/0x41c kernel/panic.c:183
>> __warn+0x1c4/0x1e0 kernel/panic.c:546
>> report_bug+0x211/0x2d0 lib/bug.c:183
>> fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:177
>> do_trap_no_signal arch/x86/kernel/traps.c:211 [inline]
>> do_trap+0x260/0x390 arch/x86/kernel/traps.c:260
>> do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:297
>> do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:310
>> invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
>> RIP: 0010:tracepoint_add_func kernel/tracepoint.c:210 [inline]
>> RIP: 0010:tracepoint_probe_register_prio+0x397/0x9a0 kernel/tracepoint.c:283
>> RSP: 0018:ffff8801d1d1f6c0 EFLAGS: 00010293
>> RAX: ffff8801d22e8540 RBX: 00000000ffffffef RCX: ffffffff81710f07
>> RDX: 0000000000000000 RSI: ffffffff85b679c0 RDI: ffff8801d5f19818
>> RBP: ffff8801d1d1f7c8 R08: ffffffff81710c10 R09: 0000000000000004
>> R10: ffff8801d1d1f6b0 R11: 0000000000000003 R12: ffffffff817597f0
>> R13: 0000000000000000 R14: 00000000ffffffff R15: ffff8801d1d1f7a0
>> tracepoint_probe_register+0x2a/0x40 kernel/tracepoint.c:304
>> register_trace_block_rq_insert include/trace/events/block.h:191 [inline]
>> blk_register_tracepoints+0x1e/0x2f0 kernel/trace/blktrace.c:1043
>> do_blk_trace_setup+0xa10/0xcf0 kernel/trace/blktrace.c:542
>> blk_trace_setup+0xbd/0x180 kernel/trace/blktrace.c:564
>
> I'm guessing the blk_trace_setup should have a mutex or something to
> protect this. Atomic counters is not enough.
>
>
> CPU0 CPU1
> ---- ----
> if (atomic_inc_return(&blk_probes_ref) == 1)
> blk_register_tracepoints();
>
> if (atomic_dec_and_test(&blk_probes_ref))
>
> if (atomic_inc_return(&blk_probes_ref) == 1)
> blk_register_tracepoints();
>
> blk_unregister_tracepoints();
>
> Would cause this to trigger.

This code dates back to:

commit c71a896154119f4ca9e89d6078f5f63ad60ef199
Author: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Date: Fri Jan 23 12:06:27 2009 -0200

blktrace: add ftrace plugin

so not really a recent regression :-)

I'll take a look at it.

--
Jens Axboe