Re: floppy: GPF in floppy_rb0_cb

From: Jiri Kosina
Date: Mon Jan 25 2016 - 09:06:41 EST


On Sun, 24 Jan 2016, Dmitry Vyukov wrote:

> Hello,
>
> The following causes program causes multiple bugs and eventually machine death:
>
> // autogenerated by syzkaller (http://github.com/google/syzkaller)
> #include <unistd.h>
> #include <stdlib.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sys/wait.h>
>
> #define N 100
>
> int main()
> {
> int i, status, pids[N];
>
> for (;;) {
> for (i = 0; i < N; i++) {
> if ((pids[i] = fork()) == 0) {
> open("/dev/fd0", O_RDWR);

Oh man, you are trying to wake up the beast everybody had hoped to have
vanish from this universe forever.

> exit(0);
> }
> }
> for (i = 0; i < N; i++) {
> while (waitpid(pids[i], &status, __WALL) != pids[i]) {
> }
> }
> }
> return 0;
> }
>
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 6 at drivers/block/floppy.c:975 schedule_bh+0x55/0x60()
> Modules linked in:
> CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.4.0+ #276
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Workqueue: floppy fd_timer_workfn
> 00000000ffffffff ffff88003df97ac0 ffffffff82999e2d 0000000000000000
> ffff88003df32f80 ffffffff8687a0e0 ffff88003df97b00 ffffffff81352089
> ffffffff8335dbb5 ffffffff8687a0e0 00000000000003cf ffffffff895cae20
> Call Trace:
> [< inline >] __dump_stack lib/dump_stack.c:15
> [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
> [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
> [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
> [<ffffffff8335dbb5>] schedule_bh+0x55/0x60 drivers/block/floppy.c:975
> [<ffffffff8336e1cf>] redo_fd_request+0x173f/0x39f0 drivers/block/floppy.c:2878
> [< inline >] seek_floppy drivers/block/floppy.c:1572
> [<ffffffff8336ad6c>] floppy_ready+0x106c/0x13f0 drivers/block/floppy.c:1911
> [<ffffffff8335c9ff>] fd_timer_workfn+0xf/0x20 drivers/block/floppy.c:985
> [<ffffffff813a0836>] process_one_work+0x796/0x1440 kernel/workqueue.c:2037
> [<ffffffff813a15bb>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2171
> [<ffffffff813b4d4f>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
> [<ffffffff86336fef>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468

redo_fd_request() is complaining because previous work item hasn't
finished yet, and the whole handling is designed as single-threaded (since
commit 070ad7e793dc). So apparently floppy_wq is racing with itself. We'll
have to find out how exactly that happened, as it might be the root cause
to everything that follows.

> ---[ end trace 40047c23eabef132 ]---
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 10091 at kernel/locking/lockdep.c:3183
> __lock_acquire+0xbc8/0x4700()
> DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS)

Locked ran out of key bitmap. I.e. long convoy waiting for one process to
proceeed, I guess; very likely waiting for global floppy_lock.

> Modules linked in:
> [< inline >] process_fd_request drivers/block/floppy.c:2893
> [<ffffffff8335df06>] __floppy_read_block_0+0x196/0x260
> drivers/block/floppy.c:3822
> [<ffffffff83364b93>] floppy_revalidate+0x573/0x770 drivers/block/floppy.c:3867

And this happens while reading block #0 to perform size auto-sensing.

> [<ffffffff8186ff91>] check_disk_change+0xf1/0x130 fs/block_dev.c:1135
> [<ffffffff8335e958>] floppy_open+0x518/0x920 drivers/block/floppy.c:3713
> [<ffffffff81871c88>] __blkdev_get+0x338/0x10e0 fs/block_dev.c:1213
> [<ffffffff818732b0>] blkdev_get+0x310/0x960 fs/block_dev.c:1352
> [<ffffffff81873b05>] blkdev_open+0x1a5/0x250 fs/block_dev.c:1507
> [<ffffffff817a9c02>] do_dentry_open+0x6a2/0xcb0 fs/open.c:736
> [<ffffffff817ad2db>] vfs_open+0x17b/0x1f0 fs/open.c:853
> [< inline >] do_last fs/namei.c:3254
> [<ffffffff817e00d9>] path_openat+0xde9/0x5e30 fs/namei.c:3386
> [<ffffffff817e895e>] do_filp_open+0x18e/0x250 fs/namei.c:3421
> [<ffffffff817ada5c>] do_sys_open+0x1fc/0x420 fs/open.c:1022
> [< inline >] SYSC_open fs/open.c:1040
> [<ffffffff817adcad>] SyS_open+0x2d/0x40 fs/open.c:1035
> [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
> arch/x86/entry/entry_64.S:185
> ---[ end trace 40047c23eabef13c ]---
> CPU: 1 PID: 10091 Comm: kworker/u8:2 Tainted: G W 4.4.0+ #276
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Workqueue: floppy fd_timer_workfn
> 00000000ffffffff ffff8800607f7650 ffffffff82999e2d ffff8800607f76c0
> ffff88005b2f4740 ffffffff8642bc40 ffff8800607f7690 ffffffff81352089
> ffffffff81454e08 ffffed000c0feed4 ffffffff8642bc40 0000000000000c6f

This trace looks incompletee, right? From the stacktraces it looks like it
might be another warning from lockdep, but the actually message doesn't
seem to have made it to your trace.

> Call Trace:
> [< inline >] __dump_stack lib/dump_stack.c:15
> [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
> [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
> [<ffffffff81352199>] warn_slowpath_fmt+0xa9/0xd0 kernel/panic.c:494
> [<ffffffff81454e08>] __lock_acquire+0xbc8/0x4700 kernel/locking/lockdep.c:3183
> [<ffffffff8145ad8c>] lock_acquire+0x1dc/0x430 kernel/locking/lockdep.c:3585
> [< inline >] __raw_spin_lock_irqsave
> include/linux/spinlock_api_smp.h:112
> [<ffffffff863365cf>] _raw_spin_lock_irqsave+0x9f/0xd0
> kernel/locking/spinlock.c:159
> [<ffffffff8143a678>] complete+0x18/0x70 kernel/sched/completion.c:33
> [<ffffffff8335dd04>] floppy_rb0_cb+0x74/0xe0 drivers/block/floppy.c:3785
> [<ffffffff828f41f7>] bio_endio+0x117/0x200 block/bio.c:1761
> [< inline >] req_bio_endio block/blk-core.c:155
> [<ffffffff82910533>] blk_update_request+0x1c3/0xbc0 block/blk-core.c:2632
> [<ffffffff82910f5a>] blk_update_bidi_request+0x2a/0x160 block/blk-core.c:2686
> [<ffffffff82913ee0>] __blk_end_bidi_request+0x30/0x60 block/blk-core.c:2802
> [<ffffffff82913f37>] __blk_end_request+0x27/0x30 block/blk-core.c:2903
> [<ffffffff83360076>] floppy_end_request+0x96/0x2a0 drivers/block/floppy.c:2213
> [<ffffffff833606d2>] request_done+0x452/0x6d0 drivers/block/floppy.c:2266
> [< inline >] seek_floppy drivers/block/floppy.c:1571
> [<ffffffff8336ad40>] floppy_ready+0x1040/0x13f0 drivers/block/floppy.c:1911
> [<ffffffff8335c9ff>] fd_timer_workfn+0xf/0x20 drivers/block/floppy.c:985
> [<ffffffff813a0836>] process_one_work+0x796/0x1440 kernel/workqueue.c:2037
> [<ffffffff813a15bb>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2171
> [<ffffffff813b4d4f>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
> [<ffffffff86336fef>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468
> ---[ end trace 40047c23eabef13d ]---
> BUG: unable to handle kernel NULL pointer dereference at 000000000000036b
> IP: [<000000000000036b>] 0x36b
> PGD 651b5067 PUD 63062067 PMD 0
> Oops: 0010 [#1] SMP DEBUG_PAGEALLOC KASAN
> Modules linked in:
> CPU: 1 PID: 10091 Comm: kworker/u8:2 Tainted: G W 4.4.0+ #276
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> Workqueue: floppy fd_timer_workfn
> task: ffff88005b2f4740 ti: ffff8800607f0000 task.ti: ffff8800607f0000
> RIP: 0010:[<000000000000036b>] [<000000000000036b>] 0x36b
> RSP: 0018:ffff8800607f7920 EFLAGS: 00010093
> RAX: ffff88005eb775c8 RBX: 000000005eafc740 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88005eb775c8
> RBP: ffff8800607f7968 R08: 0000000000000000 R09: 0000000000000000
> R10: 000000000000036b R11: ffffed000fffec09 R12: ffff88005eb775b8
> R13: dffffc0000000000 R14: ffff88005eb77608 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000000000036b CR3: 0000000065243000 CR4: 00000000000006e0
> Stack:
> ffffffff81438d28 ffff88005eb775c8 0000000100000086 0000000300000000
> ffff88005eb77578 ffff88005eb77580 0000000000000086 dffffc0000000000
> 0000000000001000 ffff8800607f7978 ffffffff81438e1e ffff8800607f79a0
> Call Trace:
> [<ffffffff81438e1e>] __wake_up_locked+0xe/0x10 kernel/sched/wait.c:105
> [<ffffffff8143a6ae>] complete+0x4e/0x70 kernel/sched/completion.c:35

And here we seem to be trying to complete already freed completion.

There might be either race regarding block #0 reading (commits 7b7b68bba5
and 6314a108ec1), or the 070ad7e793dc rework itself.

Do you happen to have (serial) console logs from the kernel before it
crashed? Does it by any chance contain any messages from the floppy
driver?

Thanks,

--
Jiri Kosina
SUSE Labs