Re: [PATCH 1/3] PM: domains: Drop the performance state vote for a device at detach

From: Ulf Hansson
Date: Tue Sep 07 2021 - 06:16:54 EST


On Mon, 6 Sept 2021 at 21:33, Dmitry Osipenko <digetx@xxxxxxxxx> wrote:
>
> 06.09.2021 20:34, Ulf Hansson пишет:
> > On Mon, 6 Sept 2021 at 16:11, Dmitry Osipenko <digetx@xxxxxxxxx> wrote:
> >>
> >> 06.09.2021 13:24, Ulf Hansson пишет:
> >>> On Sun, 5 Sept 2021 at 10:26, Dmitry Osipenko <digetx@xxxxxxxxx> wrote:
> >>>>
> >>>> 03.09.2021 17:03, Ulf Hansson пишет:
> >>>>> On Fri, 3 Sept 2021 at 11:58, Dmitry Osipenko <digetx@xxxxxxxxx> wrote:
> >>>>>>
> >>>>>> 03.09.2021 11:22, Ulf Hansson пишет:
> >>>>>>> On Fri, 3 Sept 2021 at 08:01, Dmitry Osipenko <digetx@xxxxxxxxx> wrote:
> >>>>>>>>
> >>>>>>>> 02.09.2021 13:16, Ulf Hansson пишет:
> >>>>>>>>> When a device is detached from its genpd, genpd loses track of the device,
> >>>>>>>>> including its performance state vote that may have been requested for it.
> >>>>>>>>>
> >>>>>>>>> Rather than relying on the consumer driver to drop the performance state
> >>>>>>>>> vote for its device, let's do it internally in genpd when the device is
> >>>>>>>>> getting detached. In this way, we makes sure that the aggregation of the
> >>>>>>>>> votes in genpd becomes correct.
> >>>>>>>>
> >>>>>>>> This is a dangerous behaviour in a case where performance state
> >>>>>>>> represents voltage. If hardware is kept active on detachment, say it's
> >>>>>>>> always-on, then it may be a disaster to drop the voltage for the active
> >>>>>>>> hardware.
> >>>>>>>>
> >>>>>>>> It's safe to drop performance state only if you assume that there is a
> >>>>>>>> firmware behind kernel which has its own layer of performance management
> >>>>>>>> and it will prevent the disaster by saying 'nope, I'm not doing this'.
> >>>>>>>>
> >>>>>>>> The performance state should be persistent for a device and it should be
> >>>>>>>> controlled in a conjunction with runtime PM. If platform wants to drop
> >>>>>>>> performance state to zero on detachment, then this behaviour should be
> >>>>>>>> specific to that platform.
> >>>>>>>
> >>>>>>> I understand your concern, but at this point, genpd can't help to fix this.
> >>>>>>>
> >>>>>>> Genpd has no information about the device, unless it's attached to it.
> >>>>>>> For now and for these always on HWs, we simply need to make sure the
> >>>>>>> device stays attached, in one way or the other.
> >>>>>>
> >>>>>> This indeed requires to redesign GENPD to make it more coupled with a
> >>>>>> device, but this is not a real problem for any of the current API users
> >>>>>> AFAIK. Ideally the state should be persistent to make API more universal.
> >>>>>
> >>>>> Right. In fact this has been discussed in the past. In principle, the
> >>>>> idea was to attach to genpd at device registration, rather than at
> >>>>> driver probe.
> >>>>>
> >>>>> Although, this is not very easy to implement - and it seems like the
> >>>>> churns to do, have not been really worth it. At least so far.
> >>>>>
> >>>>>>
> >>>>>> Since for today we assume that device should be suspended at the time of
> >>>>>> the detachment (if the default OPP state isn't used), it may be better
> >>>>>> to add a noisy warning message if pstate!=0, keeping the state untouched
> >>>>>> if it's not zero.
> >>>>>
> >>>>> That would just be very silly in my opinion.
> >>>>>
> >>>>> When the device is detached (suspended or not), it may cause it's PM
> >>>>> domain to be powered off - and there is really nothing we can do about
> >>>>> that from the genpd point of view.
> >>>>>
> >>>>> As stated, the only current short term solution is to avoid detaching
> >>>>> the device. Anything else, would just be papering of the issue.
> >>>>
> >>>> What about to re-evaluate the performance state of the domain after
> >>>> detachment instead of setting the state to zero?
> >>>
> >>> I am not suggesting to set the performance state of the genpd to zero,
> >>> but to drop a potential vote for a performance state for the *device*
> >>> that is about to be detached.
> >>
> >> By removing the vote of the *device*, you will drop the performance
> >> state of the genpd. If device is active and it's wrong to drop its
> >> state, then you may cause the damage.
> >>
> >>> Calling genpd_set_performance_state(dev, 0), during detach will have
> >>> the same effect as triggering a re-evaluation of the performance state
> >>> for the genpd, but after the detach.
> >>
> >> Yes
> >>
> >>>> This way PD driver may
> >>>> take an action on detachment if performance isn't zero, before hardware
> >>>> is crashed, for example it may emit a warning.
> >>>
> >>> Not sure I got that. Exactly when do you want to emit a warning and
> >>> for what reason?
> >>>
> >>> Do you want to add a check somewhere to see if
> >>> 'gpd_data->performance_state' is non zero - and then print a warning?
> >>
> >> I want to check the 'gpd_data->performance_state' from the detachment
> >> callback and emit the warning + lock further performance changes in the
> >> PD driver since it's a error condition.
> >
> > Alright, so if I understand correctly, you intend to do the check for
> > the "error condition" of the device in the genpd->detach_dev()
> > callback?
>
> Yes

Okay.

>
> > What exactly do you intend to do beyond this point, if you detect the
> > "error condition"? Locking further changes of the performance state
> > seems fragile too, especially if some other device/driver requires the
> > performance state to be raised. It sounds like you simply need to call
> > BUG_ON() then?
>
> I can lock it to high performance state.

Alright.

>
> > Also note that a very similar problem exists, *before* the device gets
> > attached in the first place. More precisely, nothing prevents the
> > performance state from being set to a non-compatible value for an
> > always-on HW/device that hasn't been attached yet. So maybe you need
> > to set the maximum performance state at genpd initializations, then
> > use the ->sync_state() callback to very that all consumers have been
> > attached to the genpd provider, before allowing the state to be
> > changed/lowered?
>
> That is already done by the PD driver.
>
> https://elixir.bootlin.com/linux/latest/source/drivers/soc/tegra/pmc.c#L3790

Yes, I already knew that, but forgot it. :-) Thanks for the pointer.
Let me rethink the approach.

In a way, it kind of sounds like this is a generic problem - so
perhaps we should think of adding a ->withdraw_sync_state() callback
that can be assigned by provider drivers, to get informed when a
consumer driver is getting unbinded.

Kind regards
Uffe