NULL pointer dereference in xhci_free_dev

From: Frantisek Hrbata
Date: Tue May 07 2013 - 10:03:47 EST


Hi,

there is a NULL pointer dereference in xhci_free_dev if
xhci_alloc_dev timeouts while waiting for a slot. Fedora has several bugs
reporting this problem.

https://bugzilla.redhat.com/show_bug.cgi?id=957500
https://bugzilla.redhat.com/show_bug.cgi?id=959016
https://bugzilla.redhat.com/show_bug.cgi?id=921659
https://bugzilla.redhat.com/show_bug.cgi?id=892935

I also found one mention about this on lkml without any reply

Date: Fri, 14 Dec 2012 22:54:28 +0100
From: Heinz Diehl <htd@xxxxxxxxxx>
To: linux-kernel@xxxxxxxxxxxxxxx
Cc: stable@xxxxxxxxxxxxxxx
Subject: [3.6.10] Null pointer dereference (xhci)
Message-ID: <20121214215428.GA4465@xxxxxxxxxxxxxxxxx>
https://lkml.org/lkml/2012/12/14/369

This was first seen on 3.6.7 and it is also reported for 3.9.0. IMHO the current
kernel has the same problem.

Here follows more info for one of the oops. It's for x86, but other reports are
for x86_64.

vanilla kernel 3.9.0 drivers/usb/host/xhci.c

3110 void xhci_free_dev(struct usb_hcd *hcd, struct usb_device *udev)
3111 {
3112 struct xhci_hcd *xhci = hcd_to_xhci(hcd);
3113 struct xhci_virt_device *virt_dev;
3114 unsigned long flags;
3115 u32 state;
3116 int i, ret;

3117 ret = xhci_check_args(hcd, udev, NULL, 0, true, __func__);
3118 /* If the host is halted due to driver unload, we still need to free the
3119 * device.
3120 */
3121 if (ret <= 0 && ret != -ENODEV)
3122 return;

3123 virt_dev = xhci->devs[udev->slot_id];

3124 /* Stop any wayward timer functions (which may grab the lock) */
3125 for (i = 0; i < 31; ++i) {
3126 virt_dev->eps[i].ep_state &= ~EP_HALT_PENDING;
3127 del_timer_sync(&virt_dev->eps[i].stop_cmd_timer);
3128 }

The problem is that virt_dev could be NULL(line 3123) and we get NULL deref.
while accessing endpoints(line 3126).

Linux version 3.9.0-0.rc8.git0.2.fc19.i686.PAE
BUG: unable to handle kernel NULL pointer dereference at 00000024
IP: [<c0806e90>] xhci_free_dev+0x60/0x130
*pdpt = 0000000000000000 *pde = f000eef3f000eef3
Oops: 0002 [#1] SMP
Modules linked in: ebtable_nat ipt_MASQUERADE nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_nat nf_nat_ipv6 ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack bnep bluetooth rfkill ebtable_filter ebtables ip6table_filter ip6_tables iTCO_wdt iTCO_vendor_support ppdev snd_ice1724 snd_ak4113 snd_pt2258 snd_ak4114 mperf snd_i2c snd_ice17xx_ak4xxx coretemp snd_ak4xxx_adda crc32_pclmul snd_rawmidi crc32c_intel snd_ac97_codec ac97_bus serio_raw microcode i2c_i801 snd_seq snd_seq_device lpc_ich snd_pcm mfd_core snd_page_alloc snd_timer snd soundcore atl1c mei parport_pc parport vhost_net tun macvtap macvlan kvm_intel binfmt_misc kvm usb_storage i915 i2c_algo_bit drm_kms_helper drm i2c_core video uinput
Pid: 34, comm: khubd Not tainted 3.9.0-0.rc8.git0.2.fc19.i686.PAE #1 To Be Filled By O.E.M. To Be Filled By O.E.M./H61M/U3S3
EIP: 0060:[<c0806e90>] EFLAGS: 00010246 CPU: 0
EIP is at xhci_free_dev+0x60/0x130
EAX: 00000000 EBX: 0000001f ECX: f12f7000 EDX: 00000000
ESI: ec2f4000 EDI: 0000003c EBP: f1323ec8 ESP: f1323ea8
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 80050033 CR2: 00000024 CR3: 00d0f000 CR4: 000407f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process khubd (pid: 34, ti=f1322000 task=f122d940 task.ti=f1322000)
Stack:
00000000 00000001 c09f83d8 f09f4e00 f12f7000 ec2f4000 ffffff00 f0aec034
f1323f6c c07d97b4 00000001 c0b7f660 00000004 c0c00540 f1323f64 c09911c6
f75ba2c0 f1323efc f122d940 f0aec0a0 00000064 f122d940 f0aafe1c f09f4e00
Call Trace:
[<c07d97b4>] hub_thread+0x704/0x1520
[<c09911c6>] ? __schedule+0x366/0x780
[<c046ccf0>] ? wake_up_bit+0x20/0x20
[<c07d90b0>] ? hub_port_debounce+0x140/0x140
[<c046c2b4>] kthread+0x94/0xa0
[<c0999d77>] ret_from_kernel_thread+0x1b/0x28
[<c046c220>] ? insert_kthread_work+0x30/0x30
Code: e8 16 ba ff ff 83 f8 ed 0f 85 bd 00 00 00 8b 86 a8 02 00 00 bb 1f 00 00 00 8b 55 f0 8b 94 82 a4 00 00 00 8d 7a 3c 90 8d 74 26 00 <83> 67 e8 fb 89 f8 81 c7 8c 00 00 00 e8 bf 4f c5 ff 83 eb 01 75
EIP: [<c0806e90>] xhci_free_dev+0x60/0x130 SS:ESP 0068:f1323ea8
CR2: 0000000000000024


Code: e8 16 ba ff ff 83 f8 ed 0f 85 bd 00 00 00 8b 86 a8 02 00 00 bb 1f 00 00 00 8b 55 f0 8b 94 82 a4 00 00 00 8d 7a 3c 90 8d 74 26 00 <83> 67 e8 fb 89 f8 81 c7 8c 00 00 00 e8 bf 4f c5 ff 83 eb 01 75
All code
========
0: e8 16 ba ff ff call 0xffffba1b
5: 83 f8 ed cmp $0xffffffed,%eax
8: 0f 85 bd 00 00 00 jne 0xcb
e: 8b 86 a8 02 00 00 mov 0x2a8(%esi),%eax
14: bb 1f 00 00 00 mov $0x1f,%ebx
19: 8b 55 f0 mov -0x10(%ebp),%edx
1c: 8b 94 82 a4 00 00 00 mov 0xa4(%edx,%eax,4),%edx
23: 8d 7a 3c lea 0x3c(%edx),%edi
26: 90 nop
27: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
2b:* 83 67 e8 fb andl $0xfffffffb,-0x18(%edi) <-- trapping instruction
2f: 89 f8 mov %edi,%eax
31: 81 c7 8c 00 00 00 add $0x8c,%edi
37: e8 bf 4f c5 ff call 0xffc54ffb
3c: 83 eb 01 sub $0x1,%ebx
3f: 75 .byte 0x75

Code starting with the faulting instruction
===========================================
0: 83 67 e8 fb andl $0xfffffffb,-0x18(%edi)
4: 89 f8 mov %edi,%eax
6: 81 c7 8c 00 00 00 add $0x8c,%edi
c: e8 bf 4f c5 ff call 0xffc54fd0
11: 83 eb 01 sub $0x1,%ebx
14: 75 .byte 0x75


http://download.eng.brq.redhat.com/pub/fedora/linux//development/19/i386/debug/k/kernel-PAE-debuginfo-3.9.0-0.rc8.git0.2.fc19.i686.rpm

$ objdump -t vmlinux | grep xhci_free_dev
c08064d0 g F .text 0000003d xhci_free_device_endpoint_resources
c0806e30 g F .text 00000128 xhci_free_dev

$ objdump -d --start-address=0xc0806e30 --stop-address=0xc0806f60 vmlinux

vmlinux: file format elf32-i386


Disassembly of section .text:

c0806e30 <xhci_free_dev>:
c0806e30: 55 push %ebp
c0806e31: 89 e5 mov %esp,%ebp
c0806e33: 57 push %edi
c0806e34: 56 push %esi
c0806e35: 53 push %ebx
c0806e36: 83 ec 14 sub $0x14,%esp
c0806e39: e8 d6 35 19 00 call c099a414 <mcount>
c0806e3e: 8b 88 08 01 00 00 mov 0x108(%eax),%ecx

ecx = xhci

c0806e44: c7 44 24 08 d8 83 9f movl $0xc09f83d8,0x8(%esp)
c0806e4b: c0
c0806e4c: 89 d6 mov %edx,%esi
c0806e4e: 89 4d f0 mov %ecx,-0x10(%ebp)
c0806e51: 31 c9 xor %ecx,%ecx
c0806e53: c7 44 24 04 01 00 00 movl $0x1,0x4(%esp)
c0806e5a: 00
c0806e5b: c7 04 24 00 00 00 00 movl $0x0,(%esp)
c0806e62: 89 45 ec mov %eax,-0x14(%ebp)
c0806e65: e8 16 ba ff ff call c0802880 <xhci_check_args>
c0806e6a: 83 f8 ed cmp $0xffffffed,%eax
c0806e6d: 0f 85 bd 00 00 00 jne c0806f30 <xhci_free_dev+0x100>
c0806e73: 8b 86 a8 02 00 00 mov 0x2a8(%esi),%eax
c0806e79: bb 1f 00 00 00 mov $0x1f,%ebx

initialize i for the loop 0 to 31
i = 0x1f

c0806e7e: 8b 55 f0 mov -0x10(%ebp),%edx

edx = xhci

c0806e81: 8b 94 82 a4 00 00 00 mov 0xa4(%edx,%eax,4),%edx

edx = virt_dev(struct xhci_virt_device)

c0806e88: 8d 7a 3c lea 0x3c(%edx),%edi

0x3c if offset to xhci_virt_device.eps[0].stop_cmd_timer
edi = virt_dev->eps[0].stop_cmd_timer

c0806e8b: 90 nop
c0806e8c: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
c0806e90: 83 67 e8 fb andl $0xfffffffb,-0x18(%edi)

-0x18 is offset from stop_cmd_timer to ep_state in xhci_virt_ep

c0806e94: 89 f8 mov %edi,%eax
c0806e96: 81 c7 8c 00 00 00 add $0x8c,%edi

edi = next stop_cmd_timer in the esp array(0x8c should be sizeof xhci_virt_ep)

c0806e9c: e8 bf 4f c5 ff call c045be60 <del_timer_sync>
c0806ea1: 83 eb 01 sub $0x1,%ebx

i--

c0806ea4: 75 ea jne c0806e90 <xhci_free_dev+0x60>

!i loop

<rest of disasm snipped because it's not important for this problem>

EBX: 0000001f means first iteration(i = 0)
EDI: 0000003c this should be address of xhci_virt_device.eps[0].stop_cmd_timer meaning
that the virt_dev is NULL(lea 0x3c(%edx),%edi)

NULL pointer dereference at 00000024 == 0x3c - 0x18 (stop_cmd_timer - ep_state)

So the bottom line is that

virt_dev = xhci->devs[udev->slot_id] is NULL

and the trap is on

virt_dev->eps[i].ep_state &= ~EP_HALT_PENDING;

<log>
[ 34.489095] usb 3-2.4: new high-speed USB device number 7 using xhci_hcd
[ 39.398980] xhci_hcd 0000:05:00.0: Timeout while waiting for address device command
[ 41.823024] usb 3-2.4: Device not responding to set address.
[ 42.020549] usb 3-2.4: device not accepting address 7, error -71
[ 46.663202] xhci_hcd 0000:05:00.0: Timeout while waiting for a slot
[ 63.038765] xhci_hcd 0000:05:00.0: Stopped the command ring failed, maybe the host is dead
[ 63.038783] xhci_hcd 0000:05:00.0: Abort command ring failed
[ 63.039033] xhci_hcd 0000:05:00.0: HC died; cleaning up
[ 63.040009] [sched_delayed] sched: RT throttling activated
[ 63.045558] hub 3-2:1.0: cannot reset port 4 (err = -19)
[ 63.045567] hub 3-2:1.0: cannot disable port 4 (err = -19)
[ 63.045592] BUG: unable to handle kernel NULL pointer dereference at 00000024
</log>

xhci_alloc_dev => xhci_cancel_cmd => usb_hc_died => usb_kick_khubd

Based on the messages it looks like xhci_alloc_dev timeouted while waiting
for a slot, meaning no xhci_alloc_virt_device call and no eps array
initialization. Now hub_events calls hub_port_reset and hub_port_disable
reporting
[ 63.045558] hub 3-2:1.0: cannot reset port 4 (err = -19)
[ 63.045567] hub 3-2:1.0: cannot disable port 4 (err = -19)
and setting USB_STATE_NOTATTACHED which later triggers
hub_quiesce => ... => xhci_free_dev and we got the NULL pointer dereference.

Please note that I know practically nothing about usb/xhci, so I for sure could
overlooked something. That's also a reason I'm not sending a patch, because
I'm not sure if simple virt_dev != NULL in xhci_free_dev is enough/correct to
fix this.

Thank you

--
Frantisek Hrbata
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/