Re: [PATCH v5 05/12] PM / devfreq: Add support for policy notifiers

From: Matthias Kaehlcke
Date: Thu Aug 02 2018 - 19:14:00 EST


Hi Chanwoo,

On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote:
> Hi Matthias,
>
> On 2018ë 08ì 02ì 02:08, Matthias Kaehlcke wrote:
> > Hi Chanwoo,
> >
> > On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
> >> On 2018ë 08ì 01ì 04:39, Matthias Kaehlcke wrote:
> >>> On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
> >>>> On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
> >>>>> Hi Matthias,
> >>>>>
> >>>>> On 2018ë 07ì 07ì 02:53, Matthias Kaehlcke wrote:
> >>>>>> Hi Chanwoo,
> >>>>>>
> >>>>>> On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
> >>>>>>
> >>>>>>> Firstly,
> >>>>>>> I'm not sure why devfreq needs the devfreq_verify_within_limits() function.
> >>>>>>>
> >>>>>>> devfreq already used the OPP interface as default. It means that
> >>>>>>> the outside of 'drivers/devfreq' can disable/enable the frequency
> >>>>>>> such as drivers/thermal/devfreq_cooling.c. Also, when some device
> >>>>>>> drivers disable/enable the specific frequency, the devfreq core
> >>>>>>> consider them.
> >>>>>>>
> >>>>>>> So, devfreq doesn't need to devfreq_verify_within_limits() because
> >>>>>>> already support some interface to change the minimum/maximum frequency
> >>>>>>> of devfreq device.
> >>>>>>>
> >>>>>>> In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
> >>>>>>> to change the minimum/maximum frequency of cpu. some device driver cannot
> >>>>>>> change the minimum/maximum frequency through OPP interface.
> >>>>>>>
> >>>>>>> But, in case of devfreq subsystem, as I explained already, devfreq support
> >>>>>>> the OPP interface as default way. devfreq subsystem doesn't need to add
> >>>>>>> other way to change the minimum/maximum frequency.
> >>>>>>
> >>>>>> Using the OPP interface exclusively works as long as a
> >>>>>> enabling/disabling of OPPs is limited to a single driver
> >>>>>> (drivers/thermal/devfreq_cooling.c). When multiple drivers are
> >>>>>> involved you need a way to resolve conflicts, that's the purpose of
> >>>>>> devfreq_verify_within_limits(). Please let me know if there are
> >>>>>> existing mechanisms for conflict resolution that I overlooked.
> >>>>>>
> >>>>>> Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
> >>>>>> devfreq_verify_within_limits() instead of the OPP interface if
> >>>>>> desired, however this seems beyond the scope of this series.
> >>>>>
> >>>>> Actually, if we uses this approach, it doesn't support the multiple drivers too.
> >>>>> If non throttler drivers uses devfreq_verify_within_limits(), the conflict
> >>>>> happen.
> >>>>
> >>>> As long as drivers limit the max freq there is no conflict, the lowest
> >>>> max freq wins. I expect this to be the usual case, apparently it
> >>>> worked for cpufreq for 10+ years.
> >>>>
> >>>> However it is correct that there would be a conflict if a driver
> >>>> requests a min freq that is higher than the max freq requested by
> >>>> another. In this case devfreq_verify_within_limits() resolves the
> >>>> conflict by raising p->max to the min freq. Not sure if this is
> >>>> something that would ever occur in practice though.
> >>>>
> >>>> If we are really concerned about this case it would also be an option
> >>>> to limit the adjustment to the max frequency.
> >>>>
> >>>>> To resolve the conflict for multiple device driver, maybe OPP interface
> >>>>> have to support 'usage_count' such as clk_enable/disable().
> >>>>
> >>>> This would require supporting negative usage count values, since a OPP
> >>>> should not be enabled if e.g. thermal enables it but the throttler
> >>>> disabled it or viceversa.
> >>>>
> >>>> Theoretically there could also be conflicts, like one driver disabling
> >>>> the higher OPPs and another the lower ones, with the outcome of all
> >>>> OPPs being disabled, which would be a more drastic conflict resolution
> >>>> than that of devfreq_verify_within_limits().
> >>>>
> >>>> Viresh, what do you think about an OPP usage count?
> >>>
> >>> Ping, can we try to reach a conclusion on this or at least keep the
> >>> discussion going?
> >>>
> >>> Not that it matters, but my preferred solution continues to be
> >>> devfreq_verify_within_limits(). It solves conflicts in some way (which
> >>> could be adjusted if needed) and has proven to work in practice for
> >>> 10+ years in a very similar sub-system.
> >>
> >> It is not true. Current cpufreq subsystem doesn't support external OPP
> >> control to enable/disable the OPP entry. If some device driver
> >> controls the OPP entry of cpufreq driver with opp_disable/enable(),
> >> the operation is not working. Because cpufreq considers the limit
> >> through 'cpufreq_verify_with_limits()' only.
> >
> > Ok, we can probably agree that using cpufreq_verify_with_limits()
> > exclusively seems to have worked well for cpufreq, and that in their
> > overall purpose cpufreq and devfreq are similar subsystems.
> >
> > The current throttler series with devfreq_verify_within_limits() takes
> > the enabled OPPs into account, the lowest and highest OPP are used as
> > a starting point for the frequency adjustment and (in theory) the
> > frequency range should only be narrowed by
> > devfreq_verify_within_limits().
> >
> >> As I already commented[1], there is different between cpufreq and devfreq.
> >> [1] https://lkml.org/lkml/2018/7/4/80
> >>
> >> Already, subsystem already used OPP interface in order to control
> >> specific OPP entry. I don't want to provide two outside method
> >> to control the frequency of devfreq driver. It might make the confusion.
> >
> > I understand your point, it would indeed be preferable to have a
> > single method. However I'm not convinced that the OPP interface is
> > a suitable solution, as I exposed earlier in this thread (quoted
> > below).
> >
> > I would like you to at least consider the possibility of changing
> > drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
> > Besides that it's not what is currently used, do you see any technical
> > concerns that would make devfreq_verify_within_limits() an unsuitable
> > or inferior solution?
>
> As we already discussed, devfreq_verify_within_limits() doesn't support
> the multiple outside controllers (e.g., devfreq-cooling.c).

That's incorrect, its purpose is precisely that.

Are you suggesting that cpufreq with its use of
cpufreq_verify_within_limits() (the inspiration for
devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c
and other drivers when receiving a CPUFREQ_ADJUST event, essentially
what I am proposing with DEVFREQ_ADJUST.

Could you elaborate why this model wouldn't work for devfreq? "OPP
interface is mandatory for devfreq" isn't really a technical argument,
is it mandatory for any other reason than that it is the interface
that is currently used?

> After you are suggesting the throttler core, there are at least two
> outside controllers (e.g., devfreq-cooling.c and throttler driver).
> As I knew the problem about conflict, I cannot agree the temporary
> method. OPP interface is mandatory for devfreq in order to control
> the OPP (frequency/voltage). In this situation, we have to try to
> find the method through OPP interface.

What do you mean with "temporary method"?

We can try to find a method through the OPP interface, but at this
point I'm not convinced that it is technically necessary or even
preferable.

Another inconvenient of the OPP approach for both devfreq-cooling.c
and the throttler is that they have to bother with disabling all OPPs
above/below the max/min (they don't/shouldn't have to care), instead
of just telling devfreq the max/min.

> We can refer to regulator/clock. Multiple device driver can use
> the regulator/clock without any problem. I think that usage of OPP
> is similiar with regulator/clock. As you mentioned, maybe OPP
> would handle the negative count. Although opp_enable/opp_disable()
> have to handle the negative count and opp_enable/opp_disable()
> can support the multiple usage from device drivers, I think that
> this approach is right.

The regulator/clock approach with the typical usage counts seems more
intuitive to me, personally I wouldn't write an interface with
negative usage count if I could reasonably avoid it.

> >> I want to use only OPP interface to enable/disable frequency
> >> even if we have to modify the OPP interface.
> >
> > These are the concerns I raised earlier about a solution with OPP
> > usage counts:
> >
> > "This would require supporting negative usage count values, since a OPP
> > should not be enabled if e.g. thermal enables it but the throttler
> > disabled it or viceversa.
>
> Already replied about negative usage count. I think that negative usage count
> is not problem if this approach could resolve the issue.
>
> >
> > Theoretically there could also be conflicts, like one driver disabling
> > the higher OPPs and another the lower ones, with the outcome of all
> > OPPs being disabled, which would be a more drastic conflict resolution
> > than that of devfreq_verify_within_limits()."
> >
> > What do you think about these points?
>
> It depends on how to use OPP interface on multiple device driver.
> Even if devfreq/opp provides the control method, outside device driver
> are misusing them. It is problem of user.

I wouldn't call it misusing if two independent drivers take
contradictory actions on an interface that doesn't provide
arbitration. How can driver A know that it shouldn't disable OPPs a, b
and c because driver B disabled d, e and f? Who is misusing the
interface, driver A or driver B?

> Instead, if we use the OPP interface, we can check why OPP entry
> is disabled or enabled through usage count.
>
> >
> > The negative usage counts aren't necessarily a dealbreaker in a
> > technical sense, though I'm not a friend of quirky interfaces that
> > don't behave like a typical user would expect (e.g. an OPP isn't
> > necessarily enabled after dev_pm_opp_enable()).
> >
> > I can sent an RFC with OPP usage counts, though due to the above
> > concerns I have doubts it will be well received.
>
> Please add me to Cc list.

Will do

Thanks

Matthias