Re: Freeing active kobject in pps_device_destruct
From: Vegard Nossum
Date: Sat Jun 25 2016 - 14:23:32 EST
On 27 November 2015 at 05:30, Sasha Levin <sasha.levin@xxxxxxxxxx> wrote:
> Hi,
>
> Fuzzing with syzkaller on the latest -next kernel produced this error:
>
> [ 1167.390182] WARNING: CPU: 14 PID: 607 at lib/debugobjects.c:263 debug_print_object+0x1c4/0x1e0()
> (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x90
> [ 1167.392644] Modules linked in:
> [ 1167.393120] CPU: 14 PID: 607 Comm: kworker/14:1 Tainted: G W 4.4.0-rc2-next-20151126 sasha-00005-g00d303e-dirty #2651
> [ 1167.394563] Workqueue: events kobject_delayed_cleanup
> [ 1167.395226] 000000000000000e 000000006f32b107 ffff8806b83478b8 fffffffface6b5bb
> [ 1167.396254] ffff8806b8347928 ffff8806b7b80000 ffffffffb515c7a0 ffff8806b83478f8
> [ 1167.397403] ffffffffab3531d3 ffffffffaced0194 ffffed00d7068f21 ffffffffb515c7a0
> [ 1167.398499] Call Trace:
> [ 1167.398891] [<fffffffface6b5bb>] dump_stack+0x72/0xb7
> [ 1167.399516] [<ffffffffab3531d3>] warn_slowpath_common+0x113/0x140
> [ 1167.401705] [<ffffffffab3532cb>] warn_slowpath_fmt+0xcb/0x100
> [ 1167.404799] [<ffffffffaced0194>] debug_print_object+0x1c4/0x1e0
> [ 1167.406723] [<ffffffffaced1035>] __debug_check_no_obj_freed+0x215/0x7a0
> [ 1167.409634] [<ffffffffaced2b6c>] debug_check_no_obj_freed+0x2c/0x40
> [ 1167.410301] [<ffffffffab7aac4c>] kfree+0x1fc/0x2f0
> [ 1167.410734] [<ffffffffb1f7a447>] pps_device_destruct+0x107/0x110
> [ 1167.413495] [<fffffffface715ad>] kobject_delayed_cleanup+0x34d/0x3b0
> [ 1167.414049] [<ffffffffab39fa37>] process_one_work+0xab7/0x13b0
> [ 1167.417188] [<ffffffffab3a0c6d>] worker_thread+0x93d/0xd20
> [ 1167.418782] [<ffffffffab3b34a0>] kthread+0x290/0x2b0
> [ 1167.422467] [<ffffffffb4a1290f>] ret_from_fork+0x3f/0x70
Reviving this thread as I got something very close to that on latest
linus/master during boot:
EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
kobject: 'dm-0' (ffff8801ce7c37d0): kobject_release, parent
(null) (delayed 250)
kobject: 'queue' (ffff8801cf29be48): kobject_release, parent
(null) (delayed 500)
------------[ cut here ]------------
WARNING: CPU: 2 PID: 107 at lib/debugobjects.c:263 debug_print_object+0xe4/0x100
ODEBUG: free active (active state 0) object type: timer_list hint:
delayed_work_timer_fn+0x0/0x50
Modules linked in: btrfs xor raid6_pq dm_crypt dm_multipath mmc_block
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel
aes_x86_64 lrw gf128mul glue_helper ablk_helper i915 cryptd i2c_alg
scopyarea sysfillrect ahci sysimgblt fb_sys_fops e1000e sdhci_pci
libahci ptp sdhci drm pps_core wmi video jitterentropy_rng drbg
ansi_cprng
CPU: 2 PID: 107 Comm: kworker/2:2 Tainted: G W 4.7.0-rc4+ #411
Hardware name: LENOVO 2349E48/2349E48, BIOS G1ET41WW (1.16 ) 05/25/2012
Workqueue: events kobject_delayed_cleanup
ffffffff816a6ed4 ffff8800356cf940 ffffffff81672645 ffff8800356cf9b8
0000000000000000 ffff8800356cf988 ffffffff810c46b1 ffff8800356c4000
0000010782369da1 ffffed0006ad9f33 0000000000000000 ffffffff81e8e660
Call Trace:
[<ffffffff81672645>] dump_stack+0x82/0xbd
[<ffffffff810c46b1>] __warn+0x111/0x130
[<ffffffff810c4767>] warn_slowpath_fmt+0x97/0xb0
[<ffffffff816a6ed4>] debug_print_object+0xe4/0x100
[<ffffffff816a87c1>] debug_check_no_obj_freed+0x3d1/0x430
[<ffffffff8132c563>] kfree+0x103/0x2f0
[<ffffffff8163af91>] disk_release+0x141/0x180
[<ffffffff818bcc5a>] device_release+0x4a/0x100
[<ffffffff816763ac>] kobject_delayed_cleanup+0x6c/0xb0
[<ffffffff810f698d>] process_one_work+0x46d/0xa60
[<ffffffff810f700b>] worker_thread+0x8b/0x730
[<ffffffff81100fe2>] kthread+0x192/0x1b0
[<ffffffff81d6240f>] ret_from_fork+0x1f/0x40
---[ end trace f15c6ecaa58bd794 ]---
Adding 8105980k swap on /dev/mapper/ubuntu-swap_1. Priority:-1
extents:1 across:8105980k SSFS
EXT4-fs (dm-1): re-mounted. Opts: errors=remount-ro
As far as I can tell, it seems like a general problem with
kobject_delayed_cleanup() OR delayed work handling. debugobjects is
complaining about the timer used to delay the work is still "active"
(not as in hasn't fired but as in not destroyed) when disk_release()
does the kfree on the struct gendisk (the timer is
gendisk::part0::__dev::kobj::release::timer -- yes, quite a path!).
Are we missing a del_timer() somewhere in kernel/workqueue.c for
delayed work handling (e.g. process_one_work())?
Adding some more timer/workqueue/debug objects people on Cc.
Vegard