Re: [Regression][v3.17][v3.18] USB: Fix persist resume of some SS USB devices

From: Pratyush Anand
Date: Wed Nov 05 2014 - 03:49:17 EST


On Wed, Nov 05, 2014 at 10:53:35AM +0530, Pratyush Anand wrote:
> Hello Joe,
>
> On Wed, Nov 05, 2014 at 06:20:06AM +0800, Joseph Salisbury wrote:
> > Hello Pratyush,
> >
> > A kernel bug report was opened against Ubuntu [0]. After a kernel
> > bisect, it was found that reverting the following commit resolved this bug:
> >
> > commit a40178b2fa6ad87670fb1e5fa4024db00c149629
> > Author: Pratyush Anand <pratyush.anand@xxxxxx>
> > Date: Fri Jul 18 12:37:10 2014 +0530
> >
> > USB: Fix persist resume of some SS USB devices
> >
> > The regression was introduced as of v3.17-rc1. The regression still
> > exists in the 3.18-rc3 mainline kernel, and has made it's way into the
> > stable kernels.
>
> Its strange, as per my understanding patch does not introduce any side
> effect for devices whose resume time is normal. So, if a device is
> connected to the SS port and it was working after resume from
> suspend to ram without this patch, then that device should still work
> with this patch.
>
> Infact this has resolved another bug reported to bugzilla
> https://bugzilla.kernel.org/show_bug.cgi?id=53211
>
>
> >
> > I was hoping to get your feedback, since you are the patch author. Do
> > you think gathering any additional data will help diagnose this issue,
>
> Yaa, sure additional info will help to understand the issue.
> -- dmesg log of suspend resume when this patch is not applied and also
> when applied.

I see that you have already provided both log to the buglink. I had a
look on that.


When it fails (with this patch):
Oct 22 15:38:59 mana kernel: [ 59.825122] PM: resume of devices
complete after 22223.878 msecs

When it passed (without this patch):
Oct 22 15:37:19 mana kernel: [ 53.495109] PM: resume of devices
complete after 3641.933 msecs
However, even when patch was not present(and it passed), there is a logical
disconnection of devices, so you would face the issue mentioned in
https://bugzilla.kernel.org/show_bug.cgi?id=53211.

Looking to the timeout, it seems that wait loop went for full length
(2S) for all the devices, still could not find a connected device. So,
basically SS link between root hub and hub was not up in 2 sec. Not
sure why, but reading further hub port status caused some fatal issue
with xhci host and so it is not able to get new device connection
after logical disconnection.

Increasing the timeout value may help. But with this long timeout I see a
possibility of sync issue between port_event and usb_port_resume. It
might happen that port_event reads hub port status before
usb_port_resume handles reset_resume.

Can you try following patch with step increasing varied value of
PORT_ENABLE_WAIT, and then let me know which value of
PORT_ENABLE_WAIT works for you (if it works ;)).

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 11e80ac31324..6eaf481d3d53 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3311,13 +3311,14 @@ static int finish_port_resume(struct usb_device *udev)
* This routine should only be called when persist is enabled for a SS
* device.
*/
+#define PORT_ENABLE_WAIT 2000
static int wait_for_ss_port_enable(struct usb_device *udev,
struct usb_hub *hub, int *port1,
u16 *portchange, u16 *portstatus)
{
int status = 0, delay_ms = 0;

- while (delay_ms < 2000) {
+ while (delay_ms < PORT_ENABLE_WAIT) {
if (status || *portstatus & USB_PORT_STAT_CONNECTION)
break;
msleep(20);
@@ -4881,6 +4882,22 @@ static void hub_port_connect_change(struct usb_hub *hub, int port1,
usb_lock_port(port_dev);
}

+static void wait_for_reset_resume(struct usb_hub *hub, int port)
+{
+ struct usb_device *udev;
+ int delay_ms = 0;
+
+ udev = hub->ports[port -1]->child;
+ if (udev && udev->reset_resume) {
+ while (delay_ms < PORT_ENABLE_WAIT) {
+ if (!udev->reset_resume)
+ break;
+ msleep(20);
+ delay_ms += 20;
+ }
+ }
+}
+
static void port_event(struct usb_hub *hub, int port1)
__must_hold(&port_dev->status_lock)
{
@@ -4894,6 +4911,8 @@ static void port_event(struct usb_hub *hub, int port1)
clear_bit(port1, hub->event_bits);
clear_bit(port1, hub->wakeup_bits);

+ wait_for_reset_resume(hub, port1);
+
if (hub_port_status(hub, port1, &portstatus, &portchange) < 0)
return;

Regards
Pratyush

> -- dmesg log with following additional debug print.
> @@ -3318,6 +3318,7 @@ static int wait_for_ss_port_enable(struct usb_device *udev,
> int status = 0, delay_ms = 0;
>
> while (delay_ms < 2000) {
> + printk("%s:portstatus is %x and portchange is %x\n", __func__, *portstatus, *portchange);
> if (status || *portstatus & USB_PORT_STAT_CONNECTION)
> break;
>
> > or would it be best to submit a revert request?
>
> Let me see what is the portstatus value and why didn't it break loop in first
> iteration if device was OK.
>
> Regards
> Pratyush
>
> >
> >
> > Thanks,
> >
> > Joe
> >
> > [0] http://pad.lv/1384041
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/