Re: PM runtime_error handling missing in many drivers?

From: Brian Norris
Date: Wed Feb 19 2025 - 17:15:57 EST


On Wed, Feb 12, 2025 at 08:29:34PM +0100, Rafael J. Wysocki wrote:
> The reason why runtime_error is there is to prevent runtime PM
> callbacks from being run until something is done about the error,
> under the assumption that running them in that case may make the
> problem worse.

What makes you think it will make the problem worse? That seems like a
rather large assumption to me. What kind of things do you think go
wrong, that it requires the framework to stop any future attempts? Just
spam (e.g., logging noise, if -EIO is persistent)? Or something worse?

And OTOH, there are clearly cases where retrying would be not only
acceptable, but expected -- so giving special case to -EAGAIN and
-EBUSY, per another branch of this thread, seems wise.

I'd also note that AFAICT, there is no similar feature in system PM. If
suspend() fails, we unwind and report the error ... but still allow
future system suspend requests. resume() is even "worse" -- errors are
essentially logged and ignored.

> I'm not sure if I see a substantial difference between suspend and
> resume in that respect: If any of them fails, the state of the device
> is kind of unstable. In particular, if resume fails and the device
> doesn't actually resume, something needs to be done about it or it
> just becomes unusable.

To me, it's about the state of the device. If suspend failed, the device
may still be active and functional -- but not power-efficient. If resume
failed, the device may be suspended and non-functional.

But anyway, I don't think I require asymmetry; I'm just more interested
in unnecessary non-functionality. (Power inefficiency is less important,
as in the worst case, we can at least save our data, reboot, and try
again.)

Brian