Re: [PATCH] xhci: clear port_remote_wakeup after resume failure

From: Mathias Nyman
Date: Tue Jun 04 2019 - 09:54:48 EST

On 27.5.2019 14.28, Nicolas Saenz Julienne wrote:
Hi Matthias,
thanks for the review.

On Mon, 2019-05-27 at 14:16 +0300, Mathias Nyman wrote:
On 24.5.2019 17.52, Nicolas Saenz Julienne wrote:
This was seen on a Dell Precision 5520 using it's WD15 dock. The dock's
Ethernet device interfaces with the laptop through one of it's USB3
ports. While idle, the Ethernet device and HCD are suspended by runtime
PM, being the only device connected on the bus. Then, both are resumed on
behalf of the Ethernet device, which has remote wake-up capabilities.

The Ethernet device was observed to randomly disconnect from the USB
port shortly after submitting it's remote wake-up request. Probably a
weird timing issue yet to be investigated. This causes runtime PM to
busyloop causing some tangible CPU load. The reason is the port gets
stuck in the middle of a remote wake-up operation, waiting for the
device to switch to U0. This never happens, leaving "port_remote_wakeup"
enabled, and automatically triggering a failure on any further suspend

This patch clears "port_remote_wakeup" upon detecting a device with a
wrong resuming port state (see Table 4-9 in Making sure the
above mentioned situation doesn't trigger a PM busyloop.

There was a similar case where the USB3 link had transitioned to a
lower power U1 or U2 state after resume, before driver read the state,
leaving port_remote_wakeup flag uncleared.

This was fixed in 5.1 kernel by commit:

6cbcf59 xhci: Fix port resume done detection for SS ports with LPM enable

Can you check if you have it?
It should be in recent stable releases as well.

I was aware of that patch, unfortunately it doesn't address the same issue. In
my case I never get a second port status event (so no PLC == 1 or any state
change seen in PLS). The device simply disconnects from the bus.

I see, ok, then we need to clear the flag in the hub thread.

But to me it looks like this patch could cause a small race risk in the successful
device initiated resume cases.

If the hub thread, i.e. the get_port_status() function, notices the U0 state before
the interrupt handler, i.e. handle_port_status() function, then port_remote_wakeup
flag is cleared in the hub thread and the wakeup notification is never called from

Would it be enough to just check for (port_remote_wakeup flag && !PORT_CONNECT) in the hub thread?
USB3 PORT_CONNECT bit is lost in most error cases.