Re: Asmedia USB 1343 crashes

From: Thomas Fjellstrom
Date: Thu May 04 2017 - 11:18:12 EST


On Thursday, May 4, 2017 6:02:53 PM MDT Mathias Nyman wrote:
> On 03.05.2017 22:20, Thomas Fjellstrom wrote:
> > On Wednesday, May 3, 2017 1:54:39 PM MDT Alan Stern wrote:
> >> On Tue, 2 May 2017, Thomas Fjellstrom wrote:
> >>
> >>> I just had a brief lockup, desktop stopped responding, other usb devices
not
> >>> on the usb3 controller. Two android devices were in the process of
restarting
> >>>
> >>> It doesn't seem to matter what android devices it is.
> >>>
> >>> [294503.849350] ------------[ cut here ]------------
> >>> [294503.849362] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316
dev_watchdog+0x223/0x230
> >>> [294503.849365] NETDEV WATCHDOG: enp4s0 (igb): transmit queue 0 timed
out
> >>> [294503.849367] Modules linked in: sr_mod cdrom ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 xt_addrtype xt_conntrack nf_nat nf_conntrack br_netfilter overlay
ebtable_filter ebtables ip6table_filter ip6_tables nfsv3 nfs_acl nfs lockd grace
iptable_filter bridge stp llc amdgpu mfd_core fuse vfat fat eeepc_wmi asus_wmi
rfkill edac_mce_amd edac_core pcspkr sg amdkfd radeon ttm sunrpc k10temp it87
hwmon_vid fam15h_power efivarfs ip_tables ipv6 autofs4 crc32c_intel i2c_piix4
> >>> [294503.849407] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.11.0-rc7 #8
> >>> [294503.849410] Hardware name: To be filled by O.E.M. To be filled by
O.E.M./970 PRO GAMING/AURA, BIOS 0901 11/07/2016
> >>> [294503.849413] Call Trace:
> >>> [294503.849417] <IRQ>
> >>> [294503.849422] dump_stack+0x4d/0x63
> >>> [294503.849426] __warn+0xc6/0xe0
> >>> [294503.849430] warn_slowpath_fmt+0x46/0x50
> >>> [294503.849434] dev_watchdog+0x223/0x230
> >>> [294503.849438] ? qdisc_rcu_free+0x40/0x40
> >>> [294503.849442] call_timer_fn+0x30/0x160
> >>> [294503.849445] ? qdisc_rcu_free+0x40/0x40
> >>> [294503.849448] run_timer_softirq+0x1e1/0x440
> >>> [294503.849453] ? lapic_next_event+0x18/0x20
> >>> [294503.849456] ? sched_clock_cpu+0x11/0xd0
> >>> [294503.849459] __do_softirq+0x101/0x2f0
> >>> [294503.849463] irq_exit+0xb9/0xc0
> >>> [294503.849466] smp_apic_timer_interrupt+0x38/0x50
> >>> [294503.849470] apic_timer_interrupt+0x86/0x90
> >>> [294503.849474] RIP: 0010:acpi_idle_do_entry+0x2c/0x40
> >>> [294503.849476] RSP: 0018:ffffffffb2a03d90 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff10
> >>> [294503.849480] RAX: 0000000000000000 RBX: ffff884d1a966c00 RCX:
0000000000000034
> >>> [294503.849483] RDX: 4ec4ec4ec4ec4ec5 RSI: 0000000000000001 RDI:
ffff884d1a966c64
> >>> [294503.849485] RBP: ffffffffb2a03dd0 R08: 00000000000003e3 R09:
0000000000000018
> >>> [294503.849487] R10: 00000000000003c1 R11: 00000000000003d4 R12:
ffff884d1a966c64
> >>> [294503.849490] R13: 0000000000000001 R14: 0000000000000001 R15:
0000000000000001
> >>> [294503.849492] </IRQ>
> >>> [294503.849497] ? acpi_idle_enter+0xd7/0x290
> >>> [294503.849502] cpuidle_enter_state+0xed/0x2e0
> >>> [294503.849506] cpuidle_enter+0x12/0x20
> >>> [294503.849509] call_cpuidle+0x1e/0x30
> >>> [294503.849512] do_idle+0x179/0x1d0
> >>> [294503.849515] cpu_startup_entry+0x5d/0x60
> >>> [294503.849518] rest_init+0x7f/0x90
> >>> [294503.849522] start_kernel+0x405/0x412
> >>> [294503.849525] x86_64_start_reservations+0x24/0x26
> >>> [294503.849528] x86_64_start_kernel+0x182/0x193
> >>> [294503.849531] start_cpu+0x14/0x14
> >>> [294503.849534] ? start_cpu+0x14/0x14
> >>> [294503.849537] ---[ end trace 12db587e781d6e4f ]---
> >>> [294503.849558] igb 0000:04:00.0 enp4s0: Reset adapter
> >>> [294504.576629] xhci_hcd 0000:02:00.0: Stop command ring failed, maybe
the host is dead
> >>> [294504.576656] xhci_hcd 0000:02:00.0: Abort command ring failed
> >>> [294504.576799] xhci_hcd 0000:02:00.0: xHCI host not responding to stop
endpoint command.
> >>> [294504.576805] xhci_hcd 0000:02:00.0: Assuming host is dying, halting
host.
> >>
> >> At this point you have reached the limit of my knowledge. The best
> >> person to help is Mathias Nyman, the xHCI maintainer (CC'ed).
> >>
>
> For some reason stopping the command ring fails, ring is stopped by writing
a
> bit in a register, hardware is supposed to clear another bit in the same
register
> when ring is stopped. We poll for the second bit immediately after writing
the first.
> If second bit is not cleare after 5 seconds we bail out.
>
> It could be that hardware never clears the bit.
>
> You said you had two android phones connected, and both were restarting.
> It could be a race in the command ring stopping code.
>
> Can you reproduce this xhci with only one android device connected?

I'll try my best. I have had issues with this controller with just devices
connected and no restarting, so I can't guarantee if I can reproduce the same
exact issue right away.

> -Mathias
>
>


--
Thomas Fjellstrom
thomas@xxxxxxxxxxxxx