Re: [PATCH] USB: Fix the issue of task recovery failure caused by USB status when S4 wakes up

From: duanchenghao
Date: Thu Sep 12 2024 - 21:52:13 EST


在 2024-09-12星期四的 11:00 -0400,Alan Stern写道:
> On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > > > S4 wakeup restores the image that was saved before the system
> > > > entered
> > > > the S4 sleep state.
> > > >
> > > >     S4 waking up from hibernation
> > > >     =============================
> > > >     kernel initialization
> > > >     |  
> > > >     v  
> > > >     freeze user task and kernel thread
> > > >     |  
> > > >     v  
> > > >     load saved image
> > > >     |   
> > > >     v  
> > > >     freeze the peripheral device and controller
> > > >     (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it
> > > > is
> > > > set,
> > > >      return to EBUSY and do not perform the following restore
> > > > image.)
> > >
> > > Why is the flag set at this point?  It should not be; the device
> > > and
> > > controller should have been frozen with wakeup disabled.
> > >
> > This is check point, not set point.
>
> Yes, I know that.  But when the flag was checked, why did the code
> find
> that it was set?  The flag should have been clear.

Yes, the current issue is that during S4 testing, there is a
probabilistic scenario where clear_bit is not called after set_bit, or
clear_bit is called but does not execute after set_bit. Please refer to
the two modification points in the v2 patch for details, as both of
them can cause the current issue.

>
> > > Is your problem related to the one discussed in this email
> > > thread?
> > >
> > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@xxxxxxxxxxxxxxxxxxx/
> > >
> > > Would the suggestion I made there -- i.e., have the xhci-hcd
> > > interrupt handler skip calling usb_hcd_resume_root_hub() if the
> > > root
> > > hub
> > > was suspended with wakeup = 0 -- fix your problem?
> >
> > Skipping usb_hcd_resume_root_hub() should generally be possible,
> > but
> > it's important to ensure that normal remote wakeup functionality is
> > not
> > compromised. Is it HUB_SUSPEND that the hub you are referring to is
> > in
> > a suspended state?
>
> I don't understand this question.  hub_quiesce() gets called with
> HUB_SUSPEND when the hub enters a suspended state.
>
> You are correct about the need for normal remote wakeup to work
> properly.  The interrupt handler should skip calling
> usb_hcd_resume_root_hub() for port connect or disconnect changes and
> for
> port overcurrent changes (when the root hub is suspended with wakeup
> =
> 0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for
> port
> resume events.

The current issue arises when rh_state is detected as RH_SUSPEND and
usb_hcd_resume_root_hub() is called to resume the root hub. However,
there is no mutual exclusion between the suspend flag, set_bit, and
clear_bit, which can lead to two scenarios:

1. After set_bit is called, the state of the USB device is modified
by another process to !USB_STATE_SUSPEND, preventing the hub's resume
from being executed, and consequently, clear_bit is not called again.

2. In another scenario, during the hub resume process, after
HCD_FLAG_WAKEUP_PENDING is cleared by clear_bit, rh_state has not yet
been set to !RH_SUSPENDED. At this point, set_bit is executed, but
since the hub has already entered the running state, the clear_bit
associated with the resume operation is not executed.

Please review the v2 patch, where I have described both the logical
flow before the modification and the revised logical flow after the
modification.

>
> Alan Stern