Re: gadgetfs regression (NULL ptr deref) since v4.4-rc7

From: Marek Szyprowski
Date: Mon Feb 08 2016 - 05:19:47 EST


Hello,

On 2016-02-08 10:46, Ruslan Bilovol wrote:
Hi Vegard,

On Mon, Feb 8, 2016 at 1:27 AM, Vegard Nossum <vegard.nossum@xxxxxxxxxx> wrote:
Hi,

Using gadgetfs on latest mainline, I get the following NULL pointer
dereference:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff81138e89>] __list_del_entry+0x29/0xc0
PGD 34f067 PUD 355067 PMD 0
Oops: 0000 [#1] DEBUG_PAGEALLOC
CPU: 0 PID: 35 Comm: afl-fuzz Not tainted 4.5.0-rc2 #1
task: ffff8800002b1ec0 ti: ffff88000033c000 task.ti: ffff88000033c000
RIP: 0010:[<ffffffff81138e89>] [<ffffffff81138e89>]
__list_del_entry+0x29/0xc0
RSP: 0018:ffff88000033fe30 EFLAGS: 00010207
RAX: 0000000000000000 RBX: ffffffff8138c108 RCX: dead000000000200
RDX: 0000000000000000 RSI: 0000000000000061 RDI: ffffffff8138c108
RBP: ffff88000033fe30 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: ffffffff8138b928
R13: ffffffff8138c040 R14: ffff88000ec0da20 R15: ffff88000033ff58
FS: 00007ffff7ff2740(0000) GS:ffffffff8135d000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000000335000 CR4: 00000000001406b0
Stack:
ffff88000033fe48 ffffffff81138f2d ffffffff8138bbd0 ffff88000033fe70
ffffffff811c880b ffff8800002f3000 ffff88000ec109a0 ffff880000298aa0
ffff88000033fe88 ffffffff811ce040 ffff88000034cdc0 ffff88000033feb8
Call Trace:
[<ffffffff81138f2d>] list_del+0xd/0x30
[<ffffffff811c880b>] usb_gadget_unregister_driver+0xdb/0xf0
[<ffffffff811ce040>] dev_release+0x20/0x60
[<ffffffff810b464c>] __fput+0x4c/0x130
[<ffffffff810b4769>] ____fput+0x9/0x10
[<ffffffff81048577>] task_work_run+0x67/0xa0
[<ffffffff81000dcf>] exit_to_usermode_loop+0x8f/0xa0
[<ffffffff8100105d>] syscall_return_slowpath+0x3d/0x40
[<ffffffff81278b89>] int_ret_from_sys_call+0x25/0x8f
Code: ff ff 55 48 8b 07 48 b9 00 01 00 00 00 00 ad de 48 8b 57 08 48 89 e5
48 39 c8 74 29 48 b9 00 02 00 00 00 00 ad de 48 39 ca 74 3a <4c> 8b 02 4c 39
c7 75 52 4c 8b 40 08 4c 39 c7 75 66 48 89 50 08
RIP [<ffffffff81138e89>] __list_del_entry+0x29/0xc0
RSP <ffff88000033fe30>
CR2: 0000000000000000
---[ end trace e6cfe1de661dcffe ]---

Reverting this commit fixes the problem for me:

commit 855ed04a3758b205e84b269f92d26ab36ed8e2f7
Author: Ruslan Bilovol <ruslan.bilovol@xxxxxxxxx>
Date: Mon Nov 23 09:56:38 2015 +0100

usb: gadget: udc-core: independent registration of gadgets and gadget
drivers

Though I am still seeing some occasional lockups. Thanks,

Original version of this patch had checking of input parameters
(see change in usb_gadget_unregister_driver at
https://lkml.org/lkml/2015/6/22/559) but this approach was
rejected by Alan Stern many times so final version pushed by
Marek Szyprowski doesn't have it, and we have this NULL pointer
dereference.
But this means there is a bug in GadgetFS that was hidden
previously: it tries to deregister gadget driver that was not previously
registered. Previous implementation was returning -ENODEV
retval, current one just does NULL pointer dereference.

At least need to fix GadgetFS that seems to by buggy in
gadget driver unregistering, and I still say we need to nave
input parameters checking in externally visible functions like
usb_gadget_unregister_driver().

Meanwhile you can try following patch that returns checking of input
parameters back and see if it helps.

Best regards,
Ruslan

--------

diff --git a/drivers/usb/gadget/udc/udc-core.c
b/drivers/usb/gadget/udc/udc-core.c
index fd73a3e..f1d375f 100644
--- a/drivers/usb/gadget/udc/udc-core.c
+++ b/drivers/usb/gadget/udc/udc-core.c
@@ -597,9 +597,16 @@ int usb_gadget_unregister_driver(struct
usb_gadget_driver *driver)
}

if (ret) {
- list_del(&driver->pending);
- ret = 0;
+ struct usb_gadget_driver *tmp;
+
+ list_for_each_entry(tmp, &gadget_driver_pending_list, pending)
+ if (tmp == driver) {
+ list_del(&driver->pending);
+ ret = 0;
+ break;
+ }
}
+
mutex_unlock(&udc_lock);
return ret;
}

Sorry, but this is not the solution. If gadgetfs is broken, then it
should be fixed. There is no point in adding workarounds in core just
because there are drivers that can be broken.

I was still not able to reproduce this issue (tried with dwc2, gadgetfs
and ptp gadget daemon). Vegard could you prepare a step-by-step
instruction how to reproduce this problem?

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland