Re: [PATCH RESEND v3 0/5] media: uvcvideo: Fix race conditions

From: Guenter Roeck
Date: Thu Sep 17 2020 - 22:16:34 EST


Hi Laurent,

On 9/17/20 5:47 AM, Laurent Pinchart wrote:
> Hi Guenter,
>
> On Wed, Sep 16, 2020 at 07:25:42PM -0700, Guenter Roeck wrote:
>> Something seems to have gone wrong with v3 of this patch series.
>> I am sure I sent it out, but I don't find it anywhere.
>> Resending. Sorry for any duplicates.
>
> I haven't checked the mailing list, but I've found it in my inbox :-)
> I'm not forgetting about you, just been fairly busy recently. I still
> plan to try and provide an alternative implementation in the V4L2 core
> (in a form that I think should even be moved to the cdev core) that
> would fix this for all drivers.
>
Thanks for letting me know. As it turns out, this problem is responsible
for about 2% of all Chromebook crashes, so I'll probably not wait for
the series to be accepted upstream but apply it as-is to the various
ChromeOS kernel branches.

> By the way, as you managed to get hold of non-UVC webcams, one thing you
> could try in your tests to make the drivers misbehave is to block on a
> DQBUF call, and unplug the device at that time. When blocking, DQBUF
> releases the driver lock (through the vb2ops .wait_prepare() and
> .wait_finis() operations for drivers based on vb2), so this may allow
> unregistration to proceed without waiting for userspace calls to
> complete.
>

Good idea. I'll give it a try.

Thanks,
Guenter

>> The uvcvideo code has no lock protection against USB disconnects
>> while video operations are ongoing. This has resulted in random
>> error reports, typically pointing to a crash in usb_ifnum_to_if(),
>> called from usb_hcd_alloc_bandwidth(). A typical traceback is as
>> follows.
>>
>> usb 1-4: USB disconnect, device number 3
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
>> PGD 0 P4D 0
>> Oops: 0000 [#1] PREEMPT SMP PTI
>> CPU: 0 PID: 5633 Comm: V4L2CaptureThre Not tainted 4.19.113-08536-g5d29ca36db06 #1
>> Hardware name: GOOGLE Edgar, BIOS Google_Edgar.7287.167.156 03/25/2019
>> RIP: 0010:usb_ifnum_to_if+0x29/0x40
>> Code: <...>
>> RSP: 0018:ffffa46f42a47a80 EFLAGS: 00010246
>> RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff904a396c9000
>> RDX: ffff904a39641320 RSI: 0000000000000001 RDI: 0000000000000000
>> RBP: ffffa46f42a47a80 R08: 0000000000000002 R09: 0000000000000000
>> R10: 0000000000009975 R11: 0000000000000009 R12: 0000000000000000
>> R13: ffff904a396b3800 R14: ffff904a39e88000 R15: 0000000000000000
>> FS: 00007f396448e700(0000) GS:ffff904a3ba00000(0000) knlGS:0000000000000000
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000000000000 CR3: 000000016cb46000 CR4: 00000000001006f0
>> Call Trace:
>> usb_hcd_alloc_bandwidth+0x1ee/0x30f
>> usb_set_interface+0x1a3/0x2b7
>> uvc_video_start_transfer+0x29b/0x4b8 [uvcvideo]
>> uvc_video_start_streaming+0x91/0xdd [uvcvideo]
>> uvc_start_streaming+0x28/0x5d [uvcvideo]
>> vb2_start_streaming+0x61/0x143 [videobuf2_common]
>> vb2_core_streamon+0xf7/0x10f [videobuf2_common]
>> uvc_queue_streamon+0x2e/0x41 [uvcvideo]
>> uvc_ioctl_streamon+0x42/0x5c [uvcvideo]
>> __video_do_ioctl+0x33d/0x42a
>> video_usercopy+0x34e/0x5ff
>> ? video_ioctl2+0x16/0x16
>> v4l2_ioctl+0x46/0x53
>> do_vfs_ioctl+0x50a/0x76f
>> ksys_ioctl+0x58/0x83
>> __x64_sys_ioctl+0x1a/0x1e
>> do_syscall_64+0x54/0xde
>>
>> While there are not many references to this problem on mailing lists, it is
>> reported on a regular basis on various Chromebooks (roughly 300 reports
>> per month). The problem is relatively easy to reproduce by adding msleep()
>> calls into the code.
>>
>> I tried to reproduce the problem with non-uvcvideo webcams, but was
>> unsuccessful. I was unable to get Philips (pwc) webcams to work. gspca
>> based webcams don't experience the problem, or at least I was unable to
>> reproduce it (The gspa driver does not trigger sending USB messages in the
>> open function, and otherwise uses the locking mechanism provided by the
>> v4l2/vb2 core).
>>
>> I don't presume to claim that I found every issue, but this patch series
>> should fix at least the major problems.
>>
>> The patch series was tested exensively on a Chromebook running chromeos-4.19
>> and on a Linux system running a v5.8.y based kernel.
>>
>> v3:
>> - In patch 5/5, add missing calls to usb_autopm_put_interface() and kfree()
>> to failure code path
>>
>> v2:
>> - Added details about problem frequency and testing with non-uvc webcams
>> to summary
>> - In patch 4/5, return EPOLLERR instead of -ENODEV on poll errors
>> - Fix description in patch 5/5
>>
>> ----------------------------------------------------------------
>> Guenter Roeck (5):
>> media: uvcvideo: Cancel async worker earlier
>> media: uvcvideo: Lock video streams and queues while unregistering
>> media: uvcvideo: Release stream queue when unregistering video device
>> media: uvcvideo: Protect uvc queue file operations against disconnect
>> media: uvcvideo: Abort uvc_v4l2_open if video device is unregistered
>>
>> drivers/media/usb/uvc/uvc_ctrl.c | 11 ++++++----
>> drivers/media/usb/uvc/uvc_driver.c | 12 ++++++++++
>> drivers/media/usb/uvc/uvc_queue.c | 32 +++++++++++++++++++++++++--
>> drivers/media/usb/uvc/uvc_v4l2.c | 45 ++++++++++++++++++++++++++++++++++++--
>> drivers/media/usb/uvc/uvcvideo.h | 1 +
>> 5 files changed, 93 insertions(+), 8 deletions(-)
>