Re: Question about error from xhci-hcd
From: Andiry Xu
Date: Wed Dec 28 2011 - 22:32:52 EST
On 12/29/2011 12:30 AM, Larry Finger wrote:
> On 11/14/2011 03:18 AM, Andiry Xu wrote:
>> On 11/02/2011 12:06 AM, Larry Finger wrote:
>>> On 10/30/2011 12:04 AM, Sarah Sharp wrote:
>>>> The xHCI driver allocates a fixed-size endpoint ring, and only so much
>>>> data can fit on it. If the driver is allocating many URBs or many URBs
>>>> with a lot of data, then you will see these messages and the URBs will
>>>> fail to be submitted. Now if neither of those conditions are true,
>>>> it's possible we just have a bug in the xHCI driver.
>>>> There is a patchset in the works to dynamically expand the endpoint
>>>> rings, but it's still going through revisions:
>>> I have a bit more to report. Applying the above patch set did not help.
>>> I modified the xHCI driver from 3.1-rc10 to provide a stack dump
>>> whenever the messages appeared. The "short transfer on control ep"
>>> occurs before the rtl8192cu device has been plugged and has the
>>> following dump, which is probably not informative:
>>> [ 3.988197] xhci_hcd 0000:05:00.0: WARN: short transfer on control ep
>>> [ 3.988208] Pid: 0, comm: kworker/0:0 Not tainted
>>> 3.1.0-0301rc9-generic #201110050905
>>> [ 3.988213] Call Trace:
>>> [ 3.988225] [<c135788d>] ? dev_warn+0x2d/0x30
>>> [ 3.988238] [<f80852d5>] xhci_irq+0x1035/0x1050 [xhci_hcd]
>>> [ 3.988249] [<c1079827>] ? tick_program_event+0x27/0x40
>>> [ 3.988261] [<f808531c>] xhci_msi_irq+0x2c/0x30 [xhci_hcd]
>>> [ 3.988270] [<c10ac5b8>] handle_irq_event_percpu+0x48/0x190
>>> [ 3.988279] [<c10aee40>] ? irq_set_chip_and_handler_name+0x40/0x40
>>> [ 3.988286] [<c10ac73f>] handle_irq_event+0x3f/0x60
>>> [ 3.988294] [<c10aee40>] ? irq_set_chip_and_handler_name+0x40/0x40
>>> [ 3.988301] [<c10aee9b>] handle_edge_irq+0x5b/0xf0
>>> [ 3.988305]<IRQ> [<c1546a31>] ? do_IRQ+0x41/0xb0
>>> [ 3.988320] [<c1542950>] ? notifier_call_chain+0x30/0x60
>>> [ 3.988328] [<c1546970>] ? common_interrupt+0x30/0x38
>>> [ 3.988337] [<c104007b>] ? sched_debug_show+0x11b/0x5f0
>>> [ 3.988345] [<c12e5524>] ? intel_idle+0xa4/0x100
>>> [ 3.988355] [<c142833c>] ? cpuidle_idle_call+0xac/0x160
>>> [ 3.988364] [<c1001c27>] ? cpu_idle+0x97/0xd0
>>> [ 3.988368] [<c1537e16>] ? start_secondary+0xf6/0x110
>>> Just in case it is needed, the full dmesg output is attached.
>>> Due to wrapping of the dmesg buffer, the first few of stack dumps for
>>> the "ERROR no room on ep ring" messages were lost, but the one I got
>>> came from the following code fragment in
>>> drivers/net/wireless/rtlwifi/usb.c at line 87:
>>> usb_fill_control_urb(urb, udev, pipe,
>>> (unsigned char *)dr, buf, len,
>>> usbctrl_async_callback, buf);
>>> rc = usb_submit_urb(urb, GFP_ATOMIC);
>>> The value of len for this call is 4. The driver only uses 1, 2, or 4 as
>>> the lengths of writes, at least those that go through usb_submit_urb().
>>> Even the firmware download is done one dword at a time.
>>> We also tested with the xHCI code from the current mainline kernel, i.e.
>>> 3.1-git, but I don't have the dmesg output for that version. If you have
>>> any patches in the pipeline, or anything to test, please send those
>>> to me.
>> A control transfer ring should not be full. Only isoc and bulk transfer
>> will cause ring full with a lot of TDs submitted simultaneously. I
>> suspect the ring is mangled.
>> Please apply the patch attached, enable CONFIG_USB_DEBUG and
>> CONFIG_USB_XHCI_HCD_DEBUGGING and post the dmesg with the "no room on ep
>> ring" error.
> Sorry to take so long to get this diagnostic info to you.
> Attached is a dmesg output. There is one of the short transfer messages
> at 216.57+ seconds.
> Thanks for looking at this.
Thanks for your test. However, I did not find anything abnormal
information in the dmesg, except for the dump, but it seems unrelated
with "no room on ring" error.
The patch is supposed to print the ep ring when it encounters a "no room
on ring" error, but it's not triggered. I see you added some control
transfer prints in the dmesg, but it seems quite normal: note the
"Toggle cycle state for ring" info, it means the ring is looped
normally, and driver does not report any no room on ring error. Can it
be reproduced frequently or hard to trigger?
The short transfer message is normal for control transfer too. Sarah has
posted some patches to remove the corresponding printk and downgrade the
debug warn level so people will not be scared.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/