Re: WARNING: ODEBUG bug in sock_hash_free

From: John Fastabend
Date: Mon Jul 02 2018 - 14:50:40 EST


On 06/25/2018 10:30 PM, syzbot wrote:
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:ÂÂÂ f0dc7f9c6dd9 Merge git://git.kernel.org/pub/scm/linux/kern..
> git tree:ÂÂÂÂÂÂ bpf-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1725589f800000
> kernel config:Â https://syzkaller.appspot.com/x/.config?x=fa9c20c48788d1c1
> dashboard link: https://syzkaller.appspot.com/bug?extid=71aeaaf993d216185076
> compiler:ÂÂÂÂÂÂ gcc (GCC) 8.0.1 20180413 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+71aeaaf993d216185076@xxxxxxxxxxxxxxxxxxxxxxxxx
>
> ------------[ cut here ]------------
> ODEBUG: free active (active state 1) object type: rcu_head hint:ÂÂÂÂÂÂÂÂÂÂ (null)
> WARNING: CPU: 1 PID: 4959 at lib/debugobjects.c:329 debug_print_object+0x16a/0x210 lib/debugobjects.c:326
> Kernel panic - not syncing: panic_on_warn set ...
>
> CPU: 1 PID: 4959 Comm: kworker/1:3 Not tainted 4.17.0+ #39
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: events bpf_map_free_deferred
> Call Trace:
> Â__dump_stack lib/dump_stack.c:77 [inline]
> Âdump_stack+0x1b9/0x294 lib/dump_stack.c:113
> Âpanic+0x22f/0x4de kernel/panic.c:184
> Â__warn.cold.8+0x163/0x1b3 kernel/panic.c:536
> Âreport_bug+0x252/0x2d0 lib/bug.c:186
> Âfixup_bug arch/x86/kernel/traps.c:178 [inline]
> Âdo_error_trap+0x1fc/0x4d0 arch/x86/kernel/traps.c:296
> Âdo_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:316
> Âinvalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
> RIP: 0010:debug_print_object+0x16a/0x210 lib/debugobjects.c:326
> Code: 1a 88 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 92 00 00 00 48 8b 14 dd 60 75 1a 88 4c 89 f6 48 c7 c7 e0 6a 1a 88 e8 06 62 ec fd <0f> 0b 83 05 39 5b 44 06 01 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f
> RSP: 0018:ffff880198e47490 EFLAGS: 00010082
> RAX: 0000000000000051 RBX: 0000000000000003 RCX: ffffffff81854ed8
> RDX: 0000000000000000 RSI: ffffffff8161f371 RDI: 0000000000000001
> RBP: ffff880198e474d0 R08: ffff8801d84b2240 R09: ffffed003b5e3ec2
> R10: ffffed003b5e3ec2 R11: ffff8801daf1f617 R12: 0000000000000001
> R13: ffffffff88f91d80 R14: ffffffff881a6f80 R15: 0000000000000000
> Â__debug_check_no_obj_freed lib/debugobjects.c:783 [inline]
> Âdebug_check_no_obj_freed+0x3a6/0x584 lib/debugobjects.c:815
> Âkfree+0xc7/0x260 mm/slab.c:3812
> Âsock_hash_free+0x24e/0x6e0 kernel/bpf/sockmap.c:2093
> Âbpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:262
> Âprocess_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
> Âworker_thread+0x181/0x13a0 kernel/workqueue.c:2296
> Âkthread+0x345/0x410 kernel/kthread.c:240
> Âret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 4.17.0+ #39 Not tainted
> ------------------------------------------------------
> kworker/1:3/4959 is trying to acquire lock:
> 00000000190110fa ((console_sem).lock){-...}, at: down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
>
> but task is already holding lock:
> 00000000af3150e8 (&obj_hash[i].lock){-.-.}, at: __debug_check_no_obj_freed lib/debugobjects.c:774 [inline]
> 00000000af3150e8 (&obj_hash[i].lock){-.-.}, at: debug_check_no_obj_freed+0x159/0x584 lib/debugobjects.c:815
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&obj_hash[i].lock){-.-.}:
> ÂÂÂÂÂÂ __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> ÂÂÂÂÂÂ _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
> ÂÂÂÂÂÂ __debug_object_init+0x11f/0x12c0 lib/debugobjects.c:381
> ÂÂÂÂÂÂ debug_object_init+0x16/0x20 lib/debugobjects.c:429
> ÂÂÂÂÂÂ debug_hrtimer_init kernel/time/hrtimer.c:410 [inline]
> ÂÂÂÂÂÂ debug_init kernel/time/hrtimer.c:458 [inline]
> ÂÂÂÂÂÂ hrtimer_init+0x8f/0x460 kernel/time/hrtimer.c:1308
> ÂÂÂÂÂÂ init_dl_task_timer+0x1b/0x50 kernel/sched/deadline.c:1056
> ÂÂÂÂÂÂ __sched_fork+0x2a8/0x570 kernel/sched/core.c:2184
> ÂÂÂÂÂÂ init_idle+0x75/0x7a0 kernel/sched/core.c:5404
> ÂÂÂÂÂÂ sched_init+0xbeb/0xd10 kernel/sched/core.c:6102
> ÂÂÂÂÂÂ start_kernel+0x475/0x92d init/main.c:602
> ÂÂÂÂÂÂ x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:452
> ÂÂÂÂÂÂ x86_64_start_kernel+0x76/0x79 arch/x86/kernel/head64.c:433
> ÂÂÂÂÂÂ secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242
>
> -> #2 (&rq->lock){-.-.}:
> ÂÂÂÂÂÂ __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
> ÂÂÂÂÂÂ _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:144
> ÂÂÂÂÂÂ rq_lock kernel/sched/sched.h:1805 [inline]
> ÂÂÂÂÂÂ task_fork_fair+0x8a/0x660 kernel/sched/fair.c:9953
> ÂÂÂÂÂÂ sched_fork+0x43e/0xb30 kernel/sched/core.c:2380
> ÂÂÂÂÂÂ copy_process.part.38+0x1bf1/0x7180 kernel/fork.c:1765
> ÂÂÂÂÂÂ copy_process kernel/fork.c:1608 [inline]
> ÂÂÂÂÂÂ _do_fork+0x291/0x12a0 kernel/fork.c:2091
> ÂÂÂÂÂÂ kernel_thread+0x34/0x40 kernel/fork.c:2150
> ÂÂÂÂÂÂ rest_init+0x22/0xe4 init/main.c:408
> ÂÂÂÂÂÂ start_kernel+0x906/0x92d init/main.c:738
> ÂÂÂÂÂÂ x86_64_start_reservations+0x29/0x2b arch/x86/kernel/head64.c:452
> ÂÂÂÂÂÂ x86_64_start_kernel+0x76/0x79 arch/x86/kernel/head64.c:433
> ÂÂÂÂÂÂ secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242
>
> -> #1 (&p->pi_lock){-.-.}:
> ÂÂÂÂÂÂ __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> ÂÂÂÂÂÂ _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
> ÂÂÂÂÂÂ try_to_wake_up+0xca/0x1280 kernel/sched/core.c:1984
> ÂÂÂÂÂÂ wake_up_process+0x10/0x20 kernel/sched/core.c:2147
> ÂÂÂÂÂÂ __up.isra.1+0x1b8/0x290 kernel/locking/semaphore.c:262
> ÂÂÂÂÂÂ up+0x12f/0x1b0 kernel/locking/semaphore.c:187
> ÂÂÂÂÂÂ __up_console_sem+0xbe/0x1b0 kernel/printk/printk.c:242
> ÂÂÂÂÂÂ console_unlock+0x79a/0x10a0 kernel/printk/printk.c:2411
> ÂÂÂÂÂÂ vprintk_emit+0x6b2/0xde0 kernel/printk/printk.c:1907
> ÂÂÂÂÂÂ vprintk_default+0x28/0x30 kernel/printk/printk.c:1948
> ÂÂÂÂÂÂ vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:382
> ÂÂÂÂÂÂ printk+0x9e/0xba kernel/printk/printk.c:1981
> ÂÂÂÂÂÂ load_umh+0x51/0xbd net/bpfilter/bpfilter_kern.c:99
> ÂÂÂÂÂÂ do_one_initcall+0x127/0x913 init/main.c:884
> ÂÂÂÂÂÂ do_initcall_level init/main.c:952 [inline]
> ÂÂÂÂÂÂ do_initcalls init/main.c:960 [inline]
> ÂÂÂÂÂÂ do_basic_setup init/main.c:978 [inline]
> ÂÂÂÂÂÂ kernel_init_freeable+0x49b/0x58e init/main.c:1135
> ÂÂÂÂÂÂ kernel_init+0x11/0x1b3 init/main.c:1061
> ÂÂÂÂÂÂ ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
>
> -> #0 ((console_sem).lock){-...}:
> ÂÂÂÂÂÂ lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3924
> ÂÂÂÂÂÂ __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> ÂÂÂÂÂÂ _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
> ÂÂÂÂÂÂ down_trylock+0x13/0x70 kernel/locking/semaphore.c:136
> ÂÂÂÂÂÂ __down_trylock_console_sem+0xae/0x200 kernel/printk/printk.c:225
> ÂÂÂÂÂÂ console_trylock+0x15/0xa0 kernel/printk/printk.c:2230
> ÂÂÂÂÂÂ console_trylock_spinning kernel/printk/printk.c:1643 [inline]
> ÂÂÂÂÂÂ vprintk_emit+0x699/0xde0 kernel/printk/printk.c:1906
> ÂÂÂÂÂÂ vprintk_default+0x28/0x30 kernel/printk/printk.c:1948
> ÂÂÂÂÂÂ vprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:382
> ÂÂÂÂÂÂ printk+0x9e/0xba kernel/printk/printk.c:1981
> ÂÂÂÂÂÂ __warn_printk+0x83/0xd0 kernel/panic.c:590
> ÂÂÂÂÂÂ debug_print_object+0x16a/0x210 lib/debugobjects.c:326
> ÂÂÂÂÂÂ __debug_check_no_obj_freed lib/debugobjects.c:783 [inline]
> ÂÂÂÂÂÂ debug_check_no_obj_freed+0x3a6/0x584 lib/debugobjects.c:815
> ÂÂÂÂÂÂ kfree+0xc7/0x260 mm/slab.c:3812
> ÂÂÂÂÂÂ sock_hash_free+0x24e/0x6e0 kernel/bpf/sockmap.c:2093
> ÂÂÂÂÂÂ bpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:262
> ÂÂÂÂÂÂ process_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
> ÂÂÂÂÂÂ worker_thread+0x181/0x13a0 kernel/workqueue.c:2296
> ÂÂÂÂÂÂ kthread+0x345/0x410 kernel/kthread.c:240
> ÂÂÂÂÂÂ ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
>
> other info that might help us debug this:
>
> Chain exists of:
> Â (console_sem).lock --> &rq->lock --> &obj_hash[i].lock
>
> ÂPossible unsafe locking scenario:
>
> ÂÂÂÂÂÂ CPU0ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ CPU1
> ÂÂÂÂÂÂ ----ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ----
> Â lock(&obj_hash[i].lock);
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ lock(&rq->lock);
> ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ lock(&obj_hash[i].lock);
> Â lock((console_sem).lock);
>
> Â*** DEADLOCK ***
>
> 4 locks held by kworker/1:3/4959:
> Â#0: 00000000f67deee4 ((wq_completion)"events"){+.+.}, at: __write_once_size include/linux/compiler.h:215 [inline]
> Â#0: 00000000f67deee4 ((wq_completion)"events"){+.+.}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
> Â#0: 00000000f67deee4 ((wq_completion)"events"){+.+.}, at: atomic64_set include/asm-generic/atomic-instrumented.h:40 [inline]
> Â#0: 00000000f67deee4 ((wq_completion)"events"){+.+.}, at: atomic_long_set include/asm-generic/atomic-long.h:59 [inline]
> Â#0: 00000000f67deee4 ((wq_completion)"events"){+.+.}, at: set_work_data kernel/workqueue.c:617 [inline]
> Â#0: 00000000f67deee4 ((wq_completion)"events"){+.+.}, at: set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
> Â#0: 00000000f67deee4 ((wq_completion)"events"){+.+.}, at: process_one_work+0xb35/0x1b70 kernel/workqueue.c:2124
> Â#1: 00000000776b40d0 ((work_completion)(&map->work)){+.+.}, at: process_one_work+0xb8c/0x1b70 kernel/workqueue.c:2128
> Â#2: 000000002a359661 (rcu_read_lock){....}, at: sock_hash_free+0x0/0x6e0 include/net/sock.h:2176
> Â#3: 00000000af3150e8 (&obj_hash[i].lock){-.-.}, at: __debug_check_no_obj_freed lib/debugobjects.c:774 [inline]
> Â#3: 00000000af3150e8 (&obj_hash[i].lock){-.-.}, at: debug_check_no_obj_freed+0x159/0x584 lib/debugobjects.c:815
>
> stack backtrace:
> CPU: 1 PID: 4959 Comm: kworker/1:3 Not tainted 4.17.0+ #39
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: events bpf_map_free_deferred
> Call Trace:
> Â__dump_stack lib/dump_stack.c:77 [inline]
> Âdump_stack+0x1b9/0x294 lib/dump_stack.c:113
> Âprint_circular_bug.isra.36.cold.56+0x1bd/0x27d kernel/locking/lockdep.c:1227
> Âcheck_prev_add kernel/locking/lockdep.c:1867 [inline]
> Âcheck_prevs_add kernel/locking/lockdep.c:1980 [inline]
> Âvalidate_chain kernel/locking/lockdep.c:2421 [inline]
> Â__lock_acquire+0x343e/0x5140 kernel/locking/lockdep.c:3435
> Âlock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3924
> Â__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> Â_raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
> Âdown_trylock+0x13/0x70 kernel/locking/semaphore.c:136
> Â__down_trylock_console_sem+0xae/0x200 kernel/printk/printk.c:225
> Âconsole_trylock+0x15/0xa0 kernel/printk/printk.c:2230
> Âconsole_trylock_spinning kernel/printk/printk.c:1643 [inline]
> Âvprintk_emit+0x699/0xde0 kernel/printk/printk.c:1906
> Âvprintk_default+0x28/0x30 kernel/printk/printk.c:1948
> Âvprintk_func+0x7a/0xe7 kernel/printk/printk_safe.c:382
> Âprintk+0x9e/0xba kernel/printk/printk.c:1981
> Â__warn_printk+0x83/0xd0 kernel/panic.c:590
> Âdebug_print_object+0x16a/0x210 lib/debugobjects.c:326
> Â__debug_check_no_obj_freed lib/debugobjects.c:783 [inline]
> Âdebug_check_no_obj_freed+0x3a6/0x584 lib/debugobjects.c:815
> Âkfree+0xc7/0x260 mm/slab.c:3812
> Âsock_hash_free+0x24e/0x6e0 kernel/bpf/sockmap.c:2093
> Âbpf_map_free_deferred+0xba/0xf0 kernel/bpf/syscall.c:262
> Âprocess_one_work+0xc64/0x1b70 kernel/workqueue.c:2153
> Âworker_thread+0x181/0x13a0 kernel/workqueue.c:2296
> Âkthread+0x345/0x410 kernel/kthread.c:240
> Âret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
> Shutting down cpus with NMI
> Dumping ftrace buffer:
> ÂÂ (ftrace buffer empty)
> Kernel Offset: disabled
> Rebooting in 86400 seconds..
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@xxxxxxxxxxxxxxxxx
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with syzbot.

#syz fix: bpf: sockhash fix omitted bucket lock in sock_close