Re: [BUGREPORT] Linux USB 3.0

From: Markus Rechberger
Date: Mon Feb 03 2014 - 18:08:16 EST


Hi Sarah,

On Mon, Jan 20, 2014 at 8:35 PM, Sarah Sharp
<sarah.a.sharp@xxxxxxxxxxxxxxx> wrote:
> Hi Markus,
>
> I'm the xHCI driver maintainer, and it helps to Cc me on USB 3.0 bug
> reports.
>
> On Sat, Dec 28, 2013 at 07:24:20AM +0100, Markus Rechberger wrote:
>> just received following log snippset:
>
> Please state which kernel version you (or your customer) is running.
> You've reported issues with several different kernel versions, so which
> kernel are you running for this particular snippet?
>
>> Dec 27 23:23:50 solist kernel: [ 36.118245] xhci_hcd 0000:00:14.0: ERROR Transfer event TRB DMA ptr
>
> These messages might be harmless. The 3.0 kernel contains a fix for
> Intel Panther Point xHCI hosts that suppresses those messages, commit
> ad808333d8201d53075a11bc8dd83b81f3d68f0b "Intel xhci: Ignore spurious
> successful event."
>
> A later commit extends that to all xHCI 1.0 hosts, commit
> 07f3cb7c28bf3f4dd80bfb136cf45810c46ac474 "usb: host: xhci: Enable
> XHCI_SPURIOUS_SUCCESS for all controllers with xhci 1.0" That was
> queued for 3.11 and marked to be backported into stable kernels as old
> as 3.0.
>
>> the previous bug report of that user:
>> https://bugzilla.kernel.org/show_bug.cgi?id=65021 xhci: complete USB freeze
>
> Hmm, Greg didn't assign that bug to me, so I missed it, sorry.
>
>> On Fri, Dec 27, 2013 at 8:59 PM, Markus Rechberger <mrechberger@xxxxxxxxx> wrote:
>> > Seems like DH87RL was working with 3.2.0-55-generic-pae unfortunately
>> > we don't have such a board for testing and customer patience is
>> > limited to bisect the kernel.
>> >
>> > Does anyone have a clue what modification could have killed USB 3.0
>> > support within those releases?
>> > It does not seem to be SG support.
>
> 3.2 was the kernel where the Intel EHCI to xHCI port switchover code
> went in. Without that code, all ports will remain under the EHCI host,
> and USB 3.0 devices will work at USB 2.0 speeds. I suspect the USB
> device triggers an issue with the xHCI driver, and 3.2 only works
> because the device is on an EHCI port without the switchover code.
>
>> > On Fri, Dec 27, 2013 at 6:18 PM, Markus Rechberger <mrechberger@xxxxxxxxx> wrote:
>> >> I just got another USB 3.0 bugreport, the entire system crashed. That
>> >> particular customer already filed a bugreport in November 2013 that
>> >> his system is in a bad state when using some USB 2.0 media devices
>> >> which even have opensource drivers built into the kernel.
>> >>
>> >> USB 3.0 support with Linux seems to be a disaster with Linux 3.6.12.
>> >> The affected board is an Intel DH87RL board.
>
> Why are they running 3.6.12 in particular? That's not a supported
> stable kernel.
>

our customers are using any kind of linux kernel. The drivers are
using USBFS (devio.c) for interfacing with USB.
It seems like you are in contact with one customer who is using the
DH87RL board.
Just today we got another one in our forum using 3.12.9-2-ARCH.
Also Synology NAS users seem to be affected by the USB 2.0 through USB
3.0 issue.


>> >> On Wed, Dec 25, 2013 at 8:18 AM, Markus Rechberger
>> >> <mrechberger@xxxxxxxxx> wrote:
>> >>> A customer using a device with USBDEVFS is reporting following
>> >>> backtrace (it seems to be a rather generic issue related to linux usb
>> >>> 3.0 in general):
>> >>> According to him this problem is reproducible as soon as he starts the
>> >>> data transfer, is there anything known about that?
>> >>>
>> >>> He is using 3.12.0-031200-generic
>
> So at this point you've reported three separate bugs, all with the same
> symptom, but different kernel versions? Are these all from the same bug
> reporter, or a different bug reporter?
>
> You've got me seriously confused right now. Please keep one bug report
> to one mail thread, and get the original bug reporter to start that
> thread. If this is from one bug reporter, please state the current
> kernel they are running, and send dmesg showing the issue with
> CONFIG_USB_DEBUG and CONFIG_USB_XHCI_HCD_DEBUGGING turned on (you may
> also need to turn on CONFIG_DYNAMIC_DEBUG in later kernels). Please
> attach the dmesg as a file, since your mail client line-wraps.
>
>> >>> Dec 24 14:22:39 homenas kernel: [ 1469.818460] xhci_hcd 0000:0f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd 0000:0f:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
>> >>> Dec 24 14:30:39 homenas kernel: last message repeated 16 times
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd 0000:0f:00.0: WARN Successful completion on short TX
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd 0000:0f:00.0: WARN Successful completion on short TX
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] xhci_hcd 0000:0f:00.0: URB transfer length is wrong, xHC issue? req. len = 46080, act. len = 1382400
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] IP: [] finish_td+0x13f/0x250
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] PGD 0
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Oops: 0000 [#1] SMP
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Modules linked in:
>> >>> videodev pci_stub vboxpci(OF) vboxnetadp(OF) vboxnetflt(OF)
>> >>> vboxdrv(OF) dm_crypt snd_hda_codec_ca0132 snd_hda_intel snd_hda_codec
>> >>> snd_hwdep snd_pcm snd_seq_midi dm_multipath psmouse scsi_dh
>> >>> snd_rawmidi serio_raw sb_edac snd_seq_midi_event edac_core snd_seq
>> >>> snd_timer snd_seq_device lpc_ich snd bnep rfcomm soundcore
>> >>> snd_page_alloc bluetooth mei_me mei mac_hid ppdev nfsd w83627ehf
>> >>> hwmon_vid nfs_acl auth_rpcgss coretemp nfs fscache lockd lp parport
>> >>> sunrpc raid10 raid456 async_pq async_xor async_memcpy
>> >>> async_raid6_recov async_tx raid0 multipath linear btrfs raid6_pq xor
>> >>> libcrc32c osst st raid1 tg3 mptsas firewire_ohci ptp mxm_wmi
>> >>> firewire_core ahci mptscsih pps_core crc_itu_t libahci mpt2sas mptbase
>> >>> wmi scsi_transport_sas raid_class [last unloaded: vmnet]
>> >>>
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GF O 3.12.0-031200-generic #201311031935
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X79 Extreme9, BIOS P3.30 01/28/2013
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] task: ffffffff81c144a0 ti: ffffffff81c00000 task.ti: ffffffff81c00000
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RIP: 0010:[] [] finish_td+0x13f/0x250
>
> It would help if your client could reproduce this oops on their machine,
> and then run markup_oops.pl to find out exactly where the driver is
> oopsing. I suspect it has to do with the bad completion length in the
> line above, but it could be unrelated.
>

well we can try but those are regular endcustomers who just want to
have things work.

Maybe it would help if we would ship a USB DVB
(DVB-C/T/T2/AnalogTV(PAL/NTSC)/FM Radio/S-Video/Composite) stick to
you?
At least you should be able to use NTSC in US using a videorecorder,
or composite with a camera.
Since our devices are using USBFS all the critical code is in
userspace, no crash should be possible actually.
However the reality looks different the linux usb stack causes freezes
and other problems when xhci is active.

The device transfers around 20mb/sec via USBFS for AnalogTV so it's a
good target for testing.

Markus

>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RSP: 0018:ffff88102fc03ca8 EFLAGS: 00010046
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RAX: ffff880f865d2b10 RBX: ffff880f865d2b00 RCX: 0000000000000006
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RDX: ffff880f865d2b10 RSI: 0000000000000007 RDI: 0000000000000046
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] RBP: ffff88102fc03d08 R08: 000000000000000a R09: 0000000000000000
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] R10: 00000000000006fd R11: 00000000000006fc R12: ffff880fd2de0000
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] R13: ffff880fd32b1780 R14: 0000000000000000 R15: ffff880fd5c5f000
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] FS: 0000000000000000(0000) GS:ffff88102fc00000(0000) knlGS:0000000000000000
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] CR2: 0000000000000004 CR3: 0000000001c0d000 CR4: 00000000000407f0
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Stack:
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] ffff88102fc03ce8 ffff880fd0bc8000 ffff88102fc03d00 ffff880fd268d1a0
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] ffff88102fc03df4 0000000100000002 ffff880fd32b1780 ffff880f865d2b00
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] ffff880fd268d1a0 ffff880fd5c5f000 ffff880fd2de0000 ffff880fd2c497b0
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] Call Trace:
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450]
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] [] process_bulk_intr_td+0x116/0x2d0
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] [] handle_tx_event+0x656/0xb50
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] [] ? __queue_work+0x3b0/0x3c0
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] [] ? call_timer_fn+0x46/0x160
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] [] xhci_handle_event+0x1db/0x2a0
>> >>> Dec 24 14:30:39 homenas kernel: [ 1469.822450] [] ? run_timer_softirq+0x1b2/0x300
>> >>> Dec 24 14:30:39 homenas kernel: [ 1470.312076] [] xhci_irq+0x120/0x1f0
>> >>> Dec 24 14:30:39 homenas kernel: [ 1470.312076] [] xhci_msi_irq+0x11/0x20
>> >>> Dec 24 14:30:39 homenas kernel: [ 1470.312076] [] handle_irq_event_percpu+0x5d/0x210
>> >>> Dec 24 14:30:39 homenas kernel: [ 1470.312076] [] handle_irq_event+0x48/0x70
>> >>> Dec 24 14:30:39 homenas kernel: [ 1470.312076] [] ? native_apic_msr_eoi_write+0x14/0x20
>> >>> Dec 24 14:30:39 homenas kernel: [ 1470.312076] [] handle_edge_irq+0x77/0x110
>
> Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/