Re: [PATCH] usb: gadget: f_uvc: fix NULL pointer dereference during unbind race

From: Alan Stern

Date: Wed Feb 25 2026 - 10:52:33 EST


On Wed, Feb 25, 2026 at 10:31:54AM +0800, Jimmy Hu (xWF) wrote:
> On Tue, Feb 24, 2026 at 11:47 PM Alan Stern <stern@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Tue, Feb 24, 2026 at 04:39:55PM +0800, Jimmy Hu wrote:
> > > Commit b81ac4395bbe ("usb: gadget: uvc: allow for application to cleanly
> > > shutdown") introduced two stages of synchronization waits totaling 1500ms
> > > in uvc_function_unbind() to prevent several types of kernel panics.
> > > However, this timing-based approach is insufficient during power
> > > management (PM) transitions.
> > >
> > > When the PM subsystem starts freezing user space processes, the
> > > wait_event_interruptible_timeout() is aborted early, which allows the
> > > unbind thread to proceed and nullify the gadget pointer
> > > (cdev->gadget = NULL):
> > >
> > > [ 814.123447][ T947] configfs-gadget.g1 gadget.0: uvc: uvc_function_unbind()
> > > [ 814.178583][ T3173] PM: suspend entry (deep)
> > > [ 814.192487][ T3173] Freezing user space processes
> > > [ 814.197668][ T947] configfs-gadget.g1 gadget.0: uvc: uvc_function_unbind no clean disconnect, wait for release
> > >
> > > When the PM subsystem resumes or aborts the suspend and tasks are
> > > restarted, the V4L2 release path is executed and attempts to access the
> > > already nullified gadget pointer, triggering a kernel panic:
> > >
> > > [ 814.292597][ C0] PM: pm_system_irq_wakeup: 479 triggered dhdpcie_host_wake
> > > [ 814.386727][ T3173] Restarting tasks ...
> > > [ 814.403522][ T4558] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030
> > > [ 814.404021][ T4558] pc : usb_gadget_deactivate+0x14/0xf4
> > > [ 814.404031][ T4558] lr : usb_function_deactivate+0x54/0x94
> > > [ 814.404078][ T4558] Call trace:
> > > [ 814.404080][ T4558] usb_gadget_deactivate+0x14/0xf4
> > > [ 814.404083][ T4558] usb_function_deactivate+0x54/0x94
> > > [ 814.404087][ T4558] uvc_function_disconnect+0x1c/0x5c
> > > [ 814.404092][ T4558] uvc_v4l2_release+0x44/0xac
> > > [ 814.404095][ T4558] v4l2_release+0xcc/0x130
> > >
> > > The fix introduces a 'func_unbinding' flag in struct uvc_device to protect
> > > critical sections:
> > > 1. In uvc_function_disconnect(), it prevents accessing the nullified
> > > cdev->gadget pointer.
> > > 2. In uvc_v4l2_release(), it ensures uvcg_free_buffers() is skipped
> > > if unbind is already in progress, avoiding races with concurrent
> > > bind operations or use-after-free on the video queue memory.
> >
> > Sorry if the answer to this question is obvious to anybody familiar with
> > the driver...
> >
> > The patch adds a flag that can be accessed by two different tasks
> > (disconnect and release). Is there any synchronization to prevent these
> > tasks from racing and accessing the new flag concurrently?
> >
> > Alan Stern
>
> Hi Alan,
>
> Thanks for pointing that out. You're right, the boolean flag lacks
> proper synchronization for concurrent access.
> I will submit a V2 patch using atomic bit operations to ensure memory
> visibility and atomicity across tasks. The changes will include:
> * Replacing 'bool func_unbinding' with 'unsigned long flags' in struct
> uvc_device.
> * Using clear_bit() in uvc_function_bind() to reset the state.
> * Using set_bit() in uvc_function_unbind() to mark the unbinding phase.
> * Using test_bit() in uvc_function_disconnect() and uvc_v4l2_disable()
> for safe checks.
>
> Does this approach look reasonable to you?

No, because data races are more complicated than just concurrent
non-atomic accesses. There's also the issue of ordering. Here's what I
mean...

You want to change the bind routine to do something like:

cdev->gadget = ...
set_bit(&uvc->flags, 0); // Driver is bound

and unbind to do something like:

clear_bit(&uvc->flags, 0); // Driver is unbound
cdev->gadget = NULL;

You'll also change disconnect and disable like this:

if (test_bit(&uvc->flags, 0)) {
... use cdev->gadget ...
}

The problem is that modern CPUs can execute instructions out of order
and can speculate loads. When the CPU runs the bind routine, there's
nothing to prevent it from doing the set_bit() before assigning
cdev->gadget. Similarly, when the CPU runs the unbind routine, there's
nothing to prevent it from setting cdev->gadget to NULL before doing the
clear_bit().

More subtly, when the CPU runs the disconnect or disable routines,
there's nothing to prevent it from speculatively reading cdev->gadget
(and getting a NULL result) before doing the test_bit() (and getting 1).

To prevent these problems, you would need to use memory barriers. For
example, you could put smb_wmb() between the two statements in bind and
unbind, and smp_rmb() in disconnect and disable before cdev->gadget
gets used. (For a lot more information about these memory barriers and
other ordering issues, see
tools/memory-model/Documentation/explanation.txt.)

A simpler approach, if it won't cause any problems, is to ensure
synchronization by protecting all these different pieces of code with a
mutex. Then they would never execute concurrently.

Alan Stern