Re: [PATCH v4 00/79] Address some issues with PM runtime at media subsystem

From: Johan Hovold
Date: Wed Apr 28 2021 - 11:50:07 EST


On Wed, Apr 28, 2021 at 04:51:21PM +0200, Mauro Carvalho Chehab wrote:
> During the review of the patches from unm.edu, one of the patterns
> I noticed is the amount of patches trying to fix pm_runtime_get_sync()
> calls.
>
> After analyzing the feedback from version 1 of this series, I noticed
> a few other weird behaviors at the PM runtime resume code. So, this
> series start addressing some bugs and issues at the current code.
> Then, it gets rid of pm_runtime_get_sync() at the media subsystem
> (with 2 exceptions).
>
> It should be noticed that
> Commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
> added a new method to does a pm_runtime get, which increments
> the usage count only on success.
>
> The rationale of getting rid of pm_runtime_get_sync() is:
>
> 1. despite its name, this is actually a PM runtime resume call,
> but some developers didn't seem to realize that, as I got this
> pattern on some drivers:
>
> pm_runtime_get_sync(&client->dev);
> pm_runtime_disable(&client->dev);
> pm_runtime_set_suspended(&client->dev);
> pm_runtime_put_noidle(&client->dev);
>
> It makes no sense to resume PM just to suspend it again ;-)

This is perfectly alright. Take a look at ov7740_remove() for example:

pm_runtime_get_sync(&client->dev);
pm_runtime_disable(&client->dev);
pm_runtime_set_suspended(&client->dev);
pm_runtime_put_noidle(&client->dev);

ov7740_set_power(ov7740, 0);

There's an explicit power-off after balancing the PM count and this will
work regardless of the power state when entering this function.

So this has nothing to do with pm_runtime_get_sync() per se.

> 2. Usual *_get() methods only increment their use count on success,
> but pm_runtime_get_sync() increments it unconditionally. Due to
> that, several drivers were mistakenly not calling
> pm_runtime_put_noidle() when it fails;

Sure, but pm_runtime_get_async() also works this way. You just won't be
notified if the async resume fails.

> 3. The name of the new variant is a lot clearer:
> pm_runtime_resume_and_get()
> As its same clearly says that this is a PM runtime resume function,
> that also increments the usage counter on success;

It also introduced an inconsistency in the API and does not pair as well
with the pm_runtime_put variants.

> 4. Consistency: we did similar changes subsystem wide with
> for instance strlcpy() and strcpy() that got replaced by
> strscpy(). Having all drivers using the same known-to-be-safe
> methods is a good thing;

It's not known to be safe; there are ways to get also this interface
wrong as for example this series has shown.

> 5. Prevent newer drivers to copy-and-paste a code that it would
> be easier to break if they don't truly understand what's behind
> the scenes.

Cargo-cult programming always runs that risk.

> This series replace places pm_runtime_get_sync(), by calling
> pm_runtime_resume_and_get() instead.
>
> This should help to avoid future mistakes like that, as people
> tend to use the existing drivers as examples for newer ones.

The only valid point about and use for pm_runtime_resume_and_get() is to
avoid leaking a PM usage count reference in the unlikely case that
resume fails (something which hardly any driver implements recovery
from anyway).

It's a convenience wrapper that saves you from writing one extra line in
some cases (depending on how you implement runtime-pm support) and not a
silver bullet against bugs.

> compile-tested only.
>
> Patches 1 to 7 fix some issues that already exists at the current
> PM runtime code;
>
> patches 8 to 20 fix some usage_count problems that still exists
> at the media subsystem;
>
> patches 21 to 78 repaces pm_runtime_get_sync() by
> pm_runtime_resume_and_get();
>
> Patch 79 (and a hunk on patch 78) documents the two exceptions
> where pm_runtime_get_sync() will still be used for now.
>
> ---
>
> v4:
> - Added a couple of additional fixes at existing PM runtime code;
> - Some patches are now more conservative in order to avoid causing
> regressions.
> v3:
> - fix a compilation error;
> v2:
> - addressed pointed issues and fixed a few other PM issues.

This really doesn't say much more than "changed stuff" so kinda hard to
track if review feedback has been taken into account for example.

Johan