Re: floppy: GPF in floppy_rb0_cb

From: Dmitry Vyukov
Date: Mon Jan 25 2016 - 10:35:19 EST


On Mon, Jan 25, 2016 at 3:06 PM, Jiri Kosina <jikos@xxxxxxxxxx> wrote:
> On Sun, 24 Jan 2016, Dmitry Vyukov wrote:
>
>> Hello,
>>
>> The following causes program causes multiple bugs and eventually machine death:
>>
>> // autogenerated by syzkaller (http://github.com/google/syzkaller)
>> #include <unistd.h>
>> #include <stdlib.h>
>> #include <sys/types.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>> #include <sys/wait.h>
>>
>> #define N 100
>>
>> int main()
>> {
>> int i, status, pids[N];
>>
>> for (;;) {
>> for (i = 0; i < N; i++) {
>> if ((pids[i] = fork()) == 0) {
>> open("/dev/fd0", O_RDWR);
>
> Oh man, you are trying to wake up the beast everybody had hoped to have
> vanish from this universe forever.
>
>> exit(0);
>> }
>> }
>> for (i = 0; i < N; i++) {
>> while (waitpid(pids[i], &status, __WALL) != pids[i]) {
>> }
>> }
>> }
>> return 0;
>> }
>>
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 6 at drivers/block/floppy.c:975 schedule_bh+0x55/0x60()
>> Modules linked in:
>> CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.4.0+ #276
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Workqueue: floppy fd_timer_workfn
>> 00000000ffffffff ffff88003df97ac0 ffffffff82999e2d 0000000000000000
>> ffff88003df32f80 ffffffff8687a0e0 ffff88003df97b00 ffffffff81352089
>> ffffffff8335dbb5 ffffffff8687a0e0 00000000000003cf ffffffff895cae20
>> Call Trace:
>> [< inline >] __dump_stack lib/dump_stack.c:15
>> [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>> [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
>> [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
>> [<ffffffff8335dbb5>] schedule_bh+0x55/0x60 drivers/block/floppy.c:975
>> [<ffffffff8336e1cf>] redo_fd_request+0x173f/0x39f0 drivers/block/floppy.c:2878
>> [< inline >] seek_floppy drivers/block/floppy.c:1572
>> [<ffffffff8336ad6c>] floppy_ready+0x106c/0x13f0 drivers/block/floppy.c:1911
>> [<ffffffff8335c9ff>] fd_timer_workfn+0xf/0x20 drivers/block/floppy.c:985
>> [<ffffffff813a0836>] process_one_work+0x796/0x1440 kernel/workqueue.c:2037
>> [<ffffffff813a15bb>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2171
>> [<ffffffff813b4d4f>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
>> [<ffffffff86336fef>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468
>
> redo_fd_request() is complaining because previous work item hasn't
> finished yet, and the whole handling is designed as single-threaded (since
> commit 070ad7e793dc). So apparently floppy_wq is racing with itself. We'll
> have to find out how exactly that happened, as it might be the root cause
> to everything that follows.
>
>> ---[ end trace 40047c23eabef132 ]---
>> ------------[ cut here ]------------
>> WARNING: CPU: 1 PID: 10091 at kernel/locking/lockdep.c:3183
>> __lock_acquire+0xbc8/0x4700()
>> DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS)
>
> Locked ran out of key bitmap. I.e. long convoy waiting for one process to
> proceeed, I guess; very likely waiting for global floppy_lock.
>
>> Modules linked in:
>> [< inline >] process_fd_request drivers/block/floppy.c:2893
>> [<ffffffff8335df06>] __floppy_read_block_0+0x196/0x260
>> drivers/block/floppy.c:3822
>> [<ffffffff83364b93>] floppy_revalidate+0x573/0x770 drivers/block/floppy.c:3867
>
> And this happens while reading block #0 to perform size auto-sensing.
>
>> [<ffffffff8186ff91>] check_disk_change+0xf1/0x130 fs/block_dev.c:1135
>> [<ffffffff8335e958>] floppy_open+0x518/0x920 drivers/block/floppy.c:3713
>> [<ffffffff81871c88>] __blkdev_get+0x338/0x10e0 fs/block_dev.c:1213
>> [<ffffffff818732b0>] blkdev_get+0x310/0x960 fs/block_dev.c:1352
>> [<ffffffff81873b05>] blkdev_open+0x1a5/0x250 fs/block_dev.c:1507
>> [<ffffffff817a9c02>] do_dentry_open+0x6a2/0xcb0 fs/open.c:736
>> [<ffffffff817ad2db>] vfs_open+0x17b/0x1f0 fs/open.c:853
>> [< inline >] do_last fs/namei.c:3254
>> [<ffffffff817e00d9>] path_openat+0xde9/0x5e30 fs/namei.c:3386
>> [<ffffffff817e895e>] do_filp_open+0x18e/0x250 fs/namei.c:3421
>> [<ffffffff817ada5c>] do_sys_open+0x1fc/0x420 fs/open.c:1022
>> [< inline >] SYSC_open fs/open.c:1040
>> [<ffffffff817adcad>] SyS_open+0x2d/0x40 fs/open.c:1035
>> [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
>> arch/x86/entry/entry_64.S:185
>> ---[ end trace 40047c23eabef13c ]---
>> CPU: 1 PID: 10091 Comm: kworker/u8:2 Tainted: G W 4.4.0+ #276
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Workqueue: floppy fd_timer_workfn
>> 00000000ffffffff ffff8800607f7650 ffffffff82999e2d ffff8800607f76c0
>> ffff88005b2f4740 ffffffff8642bc40 ffff8800607f7690 ffffffff81352089
>> ffffffff81454e08 ffffed000c0feed4 ffffffff8642bc40 0000000000000c6f
>
> This trace looks incompletee, right? From the stacktraces it looks like it
> might be another warning from lockdep, but the actually message doesn't
> seem to have made it to your trace.
>
>> Call Trace:
>> [< inline >] __dump_stack lib/dump_stack.c:15
>> [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
>> [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
>> [<ffffffff81352199>] warn_slowpath_fmt+0xa9/0xd0 kernel/panic.c:494
>> [<ffffffff81454e08>] __lock_acquire+0xbc8/0x4700 kernel/locking/lockdep.c:3183
>> [<ffffffff8145ad8c>] lock_acquire+0x1dc/0x430 kernel/locking/lockdep.c:3585
>> [< inline >] __raw_spin_lock_irqsave
>> include/linux/spinlock_api_smp.h:112
>> [<ffffffff863365cf>] _raw_spin_lock_irqsave+0x9f/0xd0
>> kernel/locking/spinlock.c:159
>> [<ffffffff8143a678>] complete+0x18/0x70 kernel/sched/completion.c:33
>> [<ffffffff8335dd04>] floppy_rb0_cb+0x74/0xe0 drivers/block/floppy.c:3785
>> [<ffffffff828f41f7>] bio_endio+0x117/0x200 block/bio.c:1761
>> [< inline >] req_bio_endio block/blk-core.c:155
>> [<ffffffff82910533>] blk_update_request+0x1c3/0xbc0 block/blk-core.c:2632
>> [<ffffffff82910f5a>] blk_update_bidi_request+0x2a/0x160 block/blk-core.c:2686
>> [<ffffffff82913ee0>] __blk_end_bidi_request+0x30/0x60 block/blk-core.c:2802
>> [<ffffffff82913f37>] __blk_end_request+0x27/0x30 block/blk-core.c:2903
>> [<ffffffff83360076>] floppy_end_request+0x96/0x2a0 drivers/block/floppy.c:2213
>> [<ffffffff833606d2>] request_done+0x452/0x6d0 drivers/block/floppy.c:2266
>> [< inline >] seek_floppy drivers/block/floppy.c:1571
>> [<ffffffff8336ad40>] floppy_ready+0x1040/0x13f0 drivers/block/floppy.c:1911
>> [<ffffffff8335c9ff>] fd_timer_workfn+0xf/0x20 drivers/block/floppy.c:985
>> [<ffffffff813a0836>] process_one_work+0x796/0x1440 kernel/workqueue.c:2037
>> [<ffffffff813a15bb>] worker_thread+0xdb/0xfc0 kernel/workqueue.c:2171
>> [<ffffffff813b4d4f>] kthread+0x23f/0x2d0 drivers/block/aoe/aoecmd.c:1303
>> [<ffffffff86336fef>] ret_from_fork+0x3f/0x70 arch/x86/entry/entry_64.S:468
>> ---[ end trace 40047c23eabef13d ]---
>> BUG: unable to handle kernel NULL pointer dereference at 000000000000036b
>> IP: [<000000000000036b>] 0x36b
>> PGD 651b5067 PUD 63062067 PMD 0
>> Oops: 0010 [#1] SMP DEBUG_PAGEALLOC KASAN
>> Modules linked in:
>> CPU: 1 PID: 10091 Comm: kworker/u8:2 Tainted: G W 4.4.0+ #276
>> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>> Workqueue: floppy fd_timer_workfn
>> task: ffff88005b2f4740 ti: ffff8800607f0000 task.ti: ffff8800607f0000
>> RIP: 0010:[<000000000000036b>] [<000000000000036b>] 0x36b
>> RSP: 0018:ffff8800607f7920 EFLAGS: 00010093
>> RAX: ffff88005eb775c8 RBX: 000000005eafc740 RCX: 0000000000000000
>> RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffff88005eb775c8
>> RBP: ffff8800607f7968 R08: 0000000000000000 R09: 0000000000000000
>> R10: 000000000000036b R11: ffffed000fffec09 R12: ffff88005eb775b8
>> R13: dffffc0000000000 R14: ffff88005eb77608 R15: 0000000000000000
>> FS: 0000000000000000(0000) GS:ffff88003ed00000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> CR2: 000000000000036b CR3: 0000000065243000 CR4: 00000000000006e0
>> Stack:
>> ffffffff81438d28 ffff88005eb775c8 0000000100000086 0000000300000000
>> ffff88005eb77578 ffff88005eb77580 0000000000000086 dffffc0000000000
>> 0000000000001000 ffff8800607f7978 ffffffff81438e1e ffff8800607f79a0
>> Call Trace:
>> [<ffffffff81438e1e>] __wake_up_locked+0xe/0x10 kernel/sched/wait.c:105
>> [<ffffffff8143a6ae>] complete+0x4e/0x70 kernel/sched/completion.c:35
>
> And here we seem to be trying to complete already freed completion.
>
> There might be either race regarding block #0 reading (commits 7b7b68bba5
> and 6314a108ec1), or the 070ad7e793dc rework itself.
>
> Do you happen to have (serial) console logs from the kernel before it
> crashed? Does it by any chance contain any messages from the floppy
> driver?

Hi,

Here is a complete log:
https://gist.githubusercontent.com/dvyukov/bd7923ece774521d6136/raw/5fae6326d90360d8bd49d6e06e0b3a7f132f00ac/gistfile1.txt
It is intermixed with executed programs, but you can grep for "] floppy".
There seems to be a bunch of "floppy: error -5 while reading block 0" errors.