Re: [PATCH] usb-core: Fix potential null pointer dereference in xhci-debugfs.c

From: Mathias Nyman
Date: Fri Dec 08 2017 - 11:59:21 EST


This is a multi-part message in MIME format. On 08.12.2017 13:06, Alexander Kappner wrote:
Hi,

I think we need to dig a bit deeper. It's good to check if spriv is
valid
but there are probably other reasons than kzalloc failing.

I agree -- this small allocation is unlikely to fail in practice.
Also, while my patch prevents the kernel oops, it also prevents the
debugfs entries from being created.

I've been debugging this more trying to come up with a better
solution, but I might need some guidance as I'm not too familiar with
the USB subsystem. The immediate cause of the crash was usbmuxd
sending a USBDEVFS_SETCONFIGURATION ioctl to a device, which _only if
it fails_ calls usb_hcd_alloc_bandwidth to try and reset the device,
which in turn calls xhci_debugfs_create_endpoint. The ioctl handler
acquires a device-specific lock via usb_lock_device.

When the system resumed from hibernate, xhci_resume was called. This
in turn called xhci_mem_cleanup to deallocate the device structures,
which include setting the debugfs_private pointer to NULL (via
xhci_free_virt_devices_depth_first). It thus seems likely that the
ioctl is somehow racing with the hibernate. The call to xhci_resume
is protected by a host-controller specific lock (xhci->lock) but it
doesn't attempt to take the usb_lock_device device-specific lock.

Now my suspicion is that xhci_resume freed and zeroed the device
structures while racing with the ioctl handler. I can't seem to find
any exclusion mechanism that would prevent xhci_resume from racing
with the USBDEVFS_SETCONFIGURATION ioctl (or any other ioctl, for
that matter). Am I missing something? If not, is there any reason why
an ioctl might need to execute in parallel with the xhci_resume? If
not, can we just do a busy wait in xhci_resume until all pending
ioctls have returned?

I'm not sure, but if I recall correctly then power management is supposed
to make sure a driver doesn't access usb devices while the host controller
is still resuming.

The odd thing here is that
xhci_debugfs_remove_slot(xhci, slot_id), and
xhci_free_virt_device(xhci, slot_id) are called together when
xhci_mem_cleanup() calls xhci_free_virt_devices_depth_first()

That means both the xhci_virt_device *dev and dev->debugfs_private
should both be freed and xhci->devs[slot_id] set NULL for that virt_device.

so xhci_add_endpoint() should fail a lot earlier because the xhci->devs[slot_id]
should be a null pointer as well.

Allocation is also done together in xhci_alloc_dev()

Looking at it more closely there is actually the .free_dev callback that
first frees the dev->debugs_private but the virt_dev is only freed
conditionally later

Attached a patch that frees them together, can you try it out?

If it doesn't help we need to add some elaborate tracing

Thanks
-Mathias