Re: [PATCH] USB: Fix the issue of task recovery failure caused by USB status when S4 wakes up

From: duanchenghao
Date: Fri Sep 13 2024 - 05:39:18 EST


在 2024-09-13星期五的 09:51 +0800,duanchenghao写道:
> 在 2024-09-12星期四的 11:00 -0400,Alan Stern写道:
> > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > > > > S4 wakeup restores the image that was saved before the system
> > > > > entered
> > > > > the S4 sleep state.
> > > > >
> > > > >     S4 waking up from hibernation
> > > > >     =============================
> > > > >     kernel initialization
> > > > >     |  
> > > > >     v  
> > > > >     freeze user task and kernel thread
> > > > >     |  
> > > > >     v  
> > > > >     load saved image
> > > > >     |   
> > > > >     v  
> > > > >     freeze the peripheral device and controller
> > > > >     (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If
> > > > > it
> > > > > is
> > > > > set,
> > > > >      return to EBUSY and do not perform the following restore
> > > > > image.)
> > > >
> > > > Why is the flag set at this point?  It should not be; the
> > > > device
> > > > and
> > > > controller should have been frozen with wakeup disabled.
> > > >
> > > This is check point, not set point.
> >
> > Yes, I know that.  But when the flag was checked, why did the code
> > find
> > that it was set?  The flag should have been clear.
>
> Yes, the current issue is that during S4 testing, there is a
> probabilistic scenario where clear_bit is not called after set_bit,
> or
> clear_bit is called but does not execute after set_bit. Please refer
> to
> the two modification points in the v2 patch for details, as both of
> them can cause the current issue.
>
> >
> > > > Is your problem related to the one discussed in this email
> > > > thread?
> > > >
> > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@xxxxxxxxxxxxxxxxxxx/
> > > >
> > > > Would the suggestion I made there -- i.e., have the xhci-hcd
> > > > interrupt handler skip calling usb_hcd_resume_root_hub() if the
> > > > root
> > > > hub
> > > > was suspended with wakeup = 0 -- fix your problem?
> > >
> > > Skipping usb_hcd_resume_root_hub() should generally be possible,
> > > but
> > > it's important to ensure that normal remote wakeup functionality
> > > is
> > > not
> > > compromised. Is it HUB_SUSPEND that the hub you are referring to
> > > is
> > > in
> > > a suspended state?
> >
> > I don't understand this question.  hub_quiesce() gets called with
> > HUB_SUSPEND when the hub enters a suspended state.
> >
> > You are correct about the need for normal remote wakeup to work
> > properly.  The interrupt handler should skip calling
> > usb_hcd_resume_root_hub() for port connect or disconnect changes
> > and
> > for
> > port overcurrent changes (when the root hub is suspended with
> > wakeup
> > =
> > 0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for
> > port
> > resume events.
>
> The current issue arises when rh_state is detected as RH_SUSPEND and
> usb_hcd_resume_root_hub() is called to resume the root hub. However,
> there is no mutual exclusion between the suspend flag, set_bit, and
> clear_bit, which can lead to two scenarios:
>
>     1. After set_bit is called, the state of the USB device is
> modified
> by another process to !USB_STATE_SUSPEND, preventing the hub's resume
> from being executed, and consequently, clear_bit is not called again.
>
>     2. In another scenario, during the hub resume process, after
> HCD_FLAG_WAKEUP_PENDING is cleared by clear_bit, rh_state has not yet
> been set to !RH_SUSPENDED. At this point, set_bit is executed, but
> since the hub has already entered the running state, the clear_bit
> associated with the resume operation is not executed.
>
> Please review the v2 patch, where I have described both the logical
> flow before the modification and the revised logical flow after the
> modification.
>

In fact, issue point 2 in the patch is introduced by issue point 1, and
issue point 2 represents a further improvement. The main issue lies in
point 1, where after the execution of the top half of the interrupt is
completed, the bottom half is frozen by S4. As a result, the USB resume
is not executed during this S4 process, and clear_bit is not called as
well. This further leads to a situation where during the process of S4
putting the USB controller into suspend, the check for
HCD_FLAG_WAKEUP_PENDING being set returns -EBUSY.

> >
> > Alan Stern
>