Re: [PATCH 38/78] media: i2c: mt9m001: use pm_runtime_resume_and_get()

From: Mauro Carvalho Chehab
Date: Wed Apr 28 2021 - 04:31:57 EST


Em Tue, 27 Apr 2021 14:23:20 +0200
Johan Hovold <johan@xxxxxxxxxx> escreveu:

> On Mon, Apr 26, 2021 at 04:38:40PM +0200, Mauro Carvalho Chehab wrote:
> > Em Sat, 24 Apr 2021 12:00:46 +0200
> > Johan Hovold <johan@xxxxxxxxxx> escreveu:
> >
> > > On Sat, Apr 24, 2021 at 10:24:54AM +0200, Jacopo Mondi wrote:
> > > > Hi Mauro,
> > > >
> > > > On Sat, Apr 24, 2021 at 08:44:48AM +0200, Mauro Carvalho Chehab wrote:
> > > > > Commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
> > > > > added pm_runtime_resume_and_get() in order to automatically handle
> > > > > dev->power.usage_count decrement on errors.
> > > > >
> > > > > Use the new API, in order to cleanup the error check logic.
> > > > >
> > > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@xxxxxxxxxx>
>
> > > I'd say this kind of mass-conversion is of questionable worth as
> > > pm_runtime_resume_and_get() isn't necessarily an improvement (even if it
> > > may have its use in some places).
> >
> > The main problem is that other parts of the driver's core APIs
> > assume that get object methods will only increment the usage
> > counter if no errors. The pm_runtime_get_sync() is an exception.
> >
> > Its name doesn't help at all: A function like that should, IMHO,
> > be called, instead:
> >
> > pm_runtime_inc_usage_count_and_try_to_resume().
> >
> > Or something similar, in order to make clearer that it always
> > increment the usage count, no matter what. If possible, all drivers
> > should get rid of it too (or alternatively add comments warning
> > people that keeping the usage_count incremented is desired on the
> > very specific places where it is really needed), as it is risky
> > to use something that has a different usage_count increement behavior
> > than other more usual *_get() functions.
>
> pm_runtime_get_sync() has worked this way since it was merged 12 years
> ago, and for someone who's used to this interface this is not such a big
> deal as you seem to think. Sure, you need to remember to put the usage
> counter on errors, but that's it (and the other side of that is that you
> don't need to worry about error handling where it doesn't matter).

Before we have those at PM subsystem, the media had its own way to
set/disable power for their sub-devices. The PCI and USB drivers
still use it, instead of pm_runtime, mostly due to historic reasons.

So, basically, its usage at the media subsystem is restricted to
drivers for embedded systems. The vast majority of drivers supporting
PM runtime are the I2C camera drivers. The camera drivers can be used
interchangeable. So, in practice, the same bridge driver can work
with a lot of different camera models, depending on the hardware
vendors' personal preferences and the desired max resolution.

So, in thesis, all such drivers should behave exactly the same
with regards to PM.

However, on most existing drivers, the pm_runtime was added a
couple of years ago, and by people that are not too familiar
with the PM subsystem.

That probably explains why there were/are several places that
do things like this[1]:

ret = pm_runtime_get_sync(dev);
if (ret < 0)
return ret;

without taking care of calling a pm_runtime_put*() function.

[1] besides the 13 patches made by UCN addressing it on
existing code, I discovered the same pattern on a
couple of other drivers with current upstream code.

That shows a pattern: several media developers are not familiar
with the usage_count behavior for pm_runtime_get functions.

So, doing this work is not only helping to make the PM support
more uniform, but it is also helping to solve existing issues.

> Also note all the pm_runtime_get functions *always* increment the usage
> count even if an async resume may later fail so there is consistency
> here.
>
> And regarding naming, the new pm_resume_and_get() looks completely out
> of place to me since it uses a different naming scheme than the other
> helpers (including the ones that are used to balance the new call).
>
> > With regards to mass-fixing it, I've seen several patches seen
> > to media fixing bugs due to the bad usage_count decrement logic.
> > So, the best is to solve them all at once, and stop using
> > pm_runtime_get_sync() inside the subsystem.
>
> Sure, having the script kiddies patch drivers without understanding what
> they're are really doing is bound to introduce bugs unless it can be
> caught in review.

Yes, but as I pointed, the current code has bugs already.

> You're call, but converting functioning drivers where the authors knew
> what they were doing just because you're not used to the API and risk
> introducing new bugs in the process isn't necessarily a good idea.

The problem is that the above assumption is not necessarily true:
based on the number of drivers that pm_runtime_get_sync() weren't
decrementing usage_count on errors, several driver authors were not
familiar enough with the PM runtime behavior by the time the drivers
were written or converted to use the PM runtime, instead of the
media .s_power()/.s_stream() callbacks.

> Especially since the pm_runtime_get_sync() will continue to be used
> elsewhere, and possibly even in media in cases where you don't need to
> check for errors (e.g. remove()).

Talking about the remove(), I'm not sure if just ignoring errors
there would do the right thing. I mean, if pm_runtime_get_sync()
fails, probably any attempts to disable clocks and other things
that depend on PM runtime will also (silently) fail.

This may put the device on an unknown PM and keep clock lines enabled
after its removal.

Thanks,
Mauro