Re: gadgetfs regression (NULL ptr deref) since v4.4-rc7

From: Vegard Nossum
Date: Mon Feb 08 2016 - 06:07:39 EST


(fixed Felipe Balbi e-mail address)

On 02/08/2016 11:19 AM, Marek Szyprowski wrote:
Hello,

On 2016-02-08 10:46, Ruslan Bilovol wrote:
Hi Vegard,

On Mon, Feb 8, 2016 at 1:27 AM, Vegard Nossum
<vegard.nossum@xxxxxxxxxx> wrote:
Hi,

Using gadgetfs on latest mainline, I get the following NULL pointer
dereference:

BUG: unable to handle kernel NULL pointer dereference at
(null)
IP: [<ffffffff81138e89>] __list_del_entry+0x29/0xc0
PGD 34f067 PUD 355067 PMD 0
Oops: 0000 [#1] DEBUG_PAGEALLOC
CPU: 0 PID: 35 Comm: afl-fuzz Not tainted 4.5.0-rc2 #1
task: ffff8800002b1ec0 ti: ffff88000033c000 task.ti: ffff88000033c000
RIP: 0010:[<ffffffff81138e89>] [<ffffffff81138e89>]
__list_del_entry+0x29/0xc0
RSP: 0018:ffff88000033fe30 EFLAGS: 00010207
RAX: 0000000000000000 RBX: ffffffff8138c108 RCX: dead000000000200
RDX: 0000000000000000 RSI: 0000000000000061 RDI: ffffffff8138c108
RBP: ffff88000033fe30 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff8138b928
R13: ffffffff8138c040 R14: ffff88000ec0da20 R15: ffff88000033ff58
FS: 00007ffff7ff2740(0000) GS:ffffffff8135d000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000000335000 CR4: 00000000001406b0
Stack:
ffff88000033fe48 ffffffff81138f2d ffffffff8138bbd0 ffff88000033fe70
ffffffff811c880b ffff8800002f3000 ffff88000ec109a0 ffff880000298aa0
ffff88000033fe88 ffffffff811ce040 ffff88000034cdc0 ffff88000033feb8
Call Trace:
[<ffffffff81138f2d>] list_del+0xd/0x30
[<ffffffff811c880b>] usb_gadget_unregister_driver+0xdb/0xf0
[<ffffffff811ce040>] dev_release+0x20/0x60
[<ffffffff810b464c>] __fput+0x4c/0x130
[<ffffffff810b4769>] ____fput+0x9/0x10
[<ffffffff81048577>] task_work_run+0x67/0xa0
[<ffffffff81000dcf>] exit_to_usermode_loop+0x8f/0xa0
[<ffffffff8100105d>] syscall_return_slowpath+0x3d/0x40
[<ffffffff81278b89>] int_ret_from_sys_call+0x25/0x8f
Code: ff ff 55 48 8b 07 48 b9 00 01 00 00 00 00 ad de 48 8b 57 08 48
89 e5
48 39 c8 74 29 48 b9 00 02 00 00 00 00 ad de 48 39 ca 74 3a <4c> 8b
02 4c 39
c7 75 52 4c 8b 40 08 4c 39 c7 75 66 48 89 50 08
RIP [<ffffffff81138e89>] __list_del_entry+0x29/0xc0
RSP <ffff88000033fe30>
CR2: 0000000000000000
---[ end trace e6cfe1de661dcffe ]---

Reverting this commit fixes the problem for me:

commit 855ed04a3758b205e84b269f92d26ab36ed8e2f7
Author: Ruslan Bilovol <ruslan.bilovol@xxxxxxxxx>
Date: Mon Nov 23 09:56:38 2015 +0100

usb: gadget: udc-core: independent registration of gadgets and
gadget
drivers

Though I am still seeing some occasional lockups. Thanks,

Original version of this patch had checking of input parameters
(see change in usb_gadget_unregister_driver at
https://lkml.org/lkml/2015/6/22/559) but this approach was
rejected by Alan Stern many times so final version pushed by
Marek Szyprowski doesn't have it, and we have this NULL pointer
dereference.
But this means there is a bug in GadgetFS that was hidden
previously: it tries to deregister gadget driver that was not previously
registered. Previous implementation was returning -ENODEV
retval, current one just does NULL pointer dereference.

At least need to fix GadgetFS that seems to by buggy in
gadget driver unregistering, and I still say we need to nave
input parameters checking in externally visible functions like
usb_gadget_unregister_driver().

Meanwhile you can try following patch that returns checking of input
parameters back and see if it helps.

Best regards,
Ruslan

--------

diff --git a/drivers/usb/gadget/udc/udc-core.c
b/drivers/usb/gadget/udc/udc-core.c
index fd73a3e..f1d375f 100644
--- a/drivers/usb/gadget/udc/udc-core.c
+++ b/drivers/usb/gadget/udc/udc-core.c
@@ -597,9 +597,16 @@ int usb_gadget_unregister_driver(struct
usb_gadget_driver *driver)
}

if (ret) {
- list_del(&driver->pending);
- ret = 0;
+ struct usb_gadget_driver *tmp;
+
+ list_for_each_entry(tmp, &gadget_driver_pending_list,
pending)
+ if (tmp == driver) {
+ list_del(&driver->pending);
+ ret = 0;
+ break;
+ }
}
+
mutex_unlock(&udc_lock);
return ret;
}


So the patch *seems* to fix the problem I was seeing. However...

Sorry, but this is not the solution. If gadgetfs is broken, then it
should be fixed. There is no point in adding workarounds in core just
because there are drivers that can be broken.

Even if it's not a fix per se, why not modify it to throw a warning
instead of crashing? If I understand correctly, something like

+WARN_ON_ONCE(ret);

after the loop in Ruslan's patch should be enough. I added a BUG_ON(ret)
(instead of WARN_ON_ONCE) and I got this:

------------[ cut here ]------------
kernel BUG at drivers/usb/gadget/udc/udc-core.c:622!
invalid opcode: 0000 [#1] DEBUG_PAGEALLOC KASAN
CPU: 0 PID: 36 Comm: afl-fuzz Not tainted 4.5.0-rc2 #1
task: ffff8800003b6900 ti: ffff88000c840000 task.ti: ffff88000c840000
RIP: 0010:[<ffffffff8152395b>] [<ffffffff8152395b>] usb_gadget_unregister_driver+0x18b/0x2d0
RSP: 0018:ffff88000c847de8 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffffffff81a117d8 RCX: 1ffff10001908fa6
RDX: 0000000000000001 RSI: 0000000000000061 RDI: ffff88000039c418
RBP: ffff88000c847e10 R08: 0000000000000246 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81a118e0
R13: ffffffff81a13f40 R14: dffffc0000000000 R15: ffff88000c83ca00
FS: 00007ffff7ff2740(0000) GS:ffffffff8193f000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffff78c53a0 CR3: 000000000c850000 CR4: 00000000001406b0
Stack:
ffffffff817f62a0 ffff880000277a40 ffff88000c83ca10 ffff88000c83ca20
ffff88000c83ca18 ffff88000c847e30 ffffffff81533bb4 ffff88000cc25758
ffff88000c83ca10 ffff88000c847e88 ffffffff811f0ccb 00007ffff780f0c0
Call Trace:
[<ffffffff81533bb4>] dev_release+0x44/0x110
[<ffffffff811f0ccb>] __fput+0x11b/0x490
[<ffffffff811f10a9>] ____fput+0x9/0x10
[<ffffffff810c8881>] task_work_run+0xf1/0x190
[<ffffffff811ea9ea>] ? filp_close+0x8a/0xe0
[<ffffffff81001c3c>] exit_to_usermode_loop+0xec/0x100
[<ffffffff81002531>] syscall_return_slowpath+0x91/0xc0
[<ffffffff817a4309>] int_ret_from_sys_call+0x25/0x8f
Code: f8 48 c1 e8 03 42 80 3c 20 00 0f 85 1b 01 00 00 48 8b 83 c8 00 00 00 48 3d a0 18 a1 81 48 8d 98 38 ff ff ff 75 be e8 45 b6 e6 ff <0f> 0b e8 3e b6 e6 ff 48 89 df e8 26 d0 ff ff 48 8d 7b 08 48 b8
RIP [<ffffffff8152395b>] usb_gadget_unregister_driver+0x18b/0x2d0
RSP <ffff88000c847de8>
---[ end trace 875139a9f2b43c09 ]---

I was still not able to reproduce this issue (tried with dwc2, gadgetfs
and ptp gadget daemon). Vegard could you prepare a step-by-step
instruction how to reproduce this problem?

It's a bit complicated to reproduce (and not 100% deterministic) since I
am running into this using a fuzzer.

But I am just using dummy_hcd in the following way:

- mount gadgetfs
- open dummy_hcd file
- read/write some data
- close dummy_hcd file
- unmount gadgetfs

I will write back if I find an easy way to reproduce it. In the meantime
I can test patches easily.


Vegard