Re: [RFC PATCH] PM / core: skip suspend next time if resume returns an error

From: Rafael J. Wysocki
Date: Tue Oct 02 2018 - 04:29:07 EST


On Tue, Oct 2, 2018 at 10:05 AM Pavel Machek <pavel@xxxxxx> wrote:
>
> Hi!
>
> > In general Linux doesn't behave super great if you get an error while
> > executing a device's resume handler. Nothing will come along later
> > and and try again to resume the device (and all devices that depend on
> > it), so pretty much you're left with a non-functioning device and
> > that's not good.
> >
> > However, even though you'll end up with a non-functioning device we
> > still don't consider resume failures to be fatal to the system. We'll
> > keep chugging along and just hope that the device that failed to
> > resume wasn't too critical. This establishes the precedent that we
> > should at least try our best not to fully bork the system after a
> > resume failure.
> >
> > I will argue that the best way to keep the system in the best shape is
> > to assume that if a resume callback failed that it did as close to
> > no-op as possible. Because of this we should consider the device
> > still suspended and shouldn't try to suspend the device again next
> > time around. Today that's not what happens. AKA if you have a
> > device
>
> I don't think there are many guarantees when device resume fail. It
> may have done nothing, and it may have resumed the device almost
> fully.
>
> I guess the best option would be to refuse system suspend after some
> device failed like that.
>
> That leaves user possibility to debug it...

I guess so.

Doing that in all cases is kind of risky IMO, because we haven't taken
the return values of the ->resume* callbacks into account so far
(except for printing the information that is), so there may be
non-lethal cases when that happens and the $subject patch would make
them not work any more.

Thanks,
Rafael