Re: [PATCH] PM / core: Fix supplier device runtime PM usage counter imbalance

From: Rafael J. Wysocki
Date: Wed Jun 13 2018 - 06:58:53 EST


Hi Marek,

On Wed, Jun 13, 2018 at 12:23 PM, Marek Szyprowski
<m.szyprowski@xxxxxxxxxxx> wrote:
> Hi Rafael,
>

[cut]

>>>>> Let's get back to my IOMMU and codec case, mentioned here:
>>>>> https://marc.info/?l=linux-pm&m=152878741527962&w=2
>>>>>
>>>>> Now, after applying your patch, when IOMMU creates a link with
>>>>> DL_FLAG_PM_RUNTIME to the jpeg device (this happens when jpeg device is
>>>>> being probed), it is not IOMMU is not runtime resumed anymore (that's
>>>>> because the patch changes pm_runtime_get_sync to pm_runtime_get_noresume).
>>>>> This means that until jpeg driver enables runtime pm for its device and
>>>>> finally performs runtime pm suspends/resume cycle, the supplier won't be
>>>>> resumed. On the other hand, when I add DL_FLAG_RPM_ACTIVE flag to link
>>>>> creation, the supplier is properly resumed, but then, once the jpeg
>>>>> device probe finishes, the supplier is still in runtime active state
>>>>> until a next runtime suspend/resume cycle of jpeg device.
>>>> That additional suspend-resume cycle should not be necessary in theory
>>>> unless I'm missing something.
>>>>
>>>> The pm_request_idle() call in driver_probe_device() should trigger a
>>>> suspend of the jpeg device after probe (unless blocked by something)
>>>> and that should drop the RPM usage counter of the IOMMU. Next, the
>>>> pm_runtime_put_suppliers() in there should actually suspend it.
>>> I've also would expect such behavior of PM core, but checks on real
>>> hardware gives other results.
>>>
>>>> It looks like the pm_request_idle() doesn't work as expected.
>>> pm_request_idle() calls rpm_idle(), which returns early with -EAGAIN due to
>>> (dev->power.runtime_status != RPM_ACTIVE) check. The device runtime_status
>>> is RPM_SUSPENDED as initially set by pm_runtime_init() on device creation.
>>> Notice that JPEG driver only calls pm_runtime_enable() and nothing more.
>>
>> But is the device really suspended during probe?
>
> This is a runtime pm state of the newly created platform device when driver
> core calls ->probe() from its driver. At that time it is not yet known if
> the driver supports runtime pm or not and typically drivers do some hardware
> access there. Platform device is created from device tree.

By the time the core calls pm_request_idle() in driver_probe_device(),
really_probe() has run already and the driver's ->probe() should have
run and that should damn well know if runtime PM can be supported.

>> Note that "suspend" means basically "not accessible to software", but
>> I guess software needs to access it to set it up, right? If that is
>> the case, maybe the driver should set the initial RPM status of the
>> device to "active" before enabling runtime PM for it? That at least
>> would be consistent with passing DL_FLAG_RPM_ACTIVE to
>> device_link_add().
>>
>> There are drivers that don't actually touch the hardware in ->probe(),
>> but it follows from your description that this is not one of them.
>
> The JPEG driver was just an example, and it actually doesn't touch hw in
> probe. However I would like to have the typical cases working:
>
> 1. runtime pm enabled, no hw access
> 2. runtime pm enabled, some hw access (guarded by either
> pm_runtime_get_sync or pm_runtime_get_noresume+pm_runtime_set_active)
> 3. runtime pm disabled (no runtime pm calls at all), some hw access.
>
> For the last type it is important to enable IOMMU during the probe().

I see.

In that case whoever adds the link needs to do an extra
pm_runtime_resume() on the supplier after the link has been added.

Doing that in device_link_add() itself would adversely affect the case
in which the creator of the link does not want the supplier to be
resumed.

>>>>> If I understand right, the DL_FLAG_RPM_ACTIVE flag should be there from the
>>>>> beginning, but that time it runtime pm part of links worked in a bit
>>>>> different way than now.
>>>> Right, and evidently there are callers depending on the existing behavior.
>>>>
>>>>> Is there any way to keep old behavior?
>>>> Yes, there is, please test the appended v2 of the $subject patch.
>>>>
>>>> That said, I'd like to remove the extra supplier resume from there going
>>>> forward as there may be callers who actually don't want that to happen and
>>>> DL_FLAG_RPM_ACTIVE is there for a purpose.
>>> Frankly, if the current behavior matches the designed behavior of
>>> DL_FLAG_RPM_ACTIVE flag,
>> It doesn't match the DL_FLAG_RPM_ACTIVE exactly as you've already noticed.
>>
>> DL_FLAG_RPM_ACTIVE assumes that the initial RPM status of the device
>> will be RPM_ACTIVE and therefore the suppliers need to be resumed at
>> link creation time. Therefore device_link_add() causes the suppliers
>> to remain in the RPM_ACTIVE state with the rpm_active status bit of
>> the link set, whereas currently they are simply suspended again by the
>> pm_runtime_put_suppliers() in driver_probe_device() and the link is
>> not marked as "rpm_active".
>>
>>> then maybe instead of adding workarounds now, we
>>> should simply fix all existing callers of device_link_add()? 'git grep'
>>> shows
>>> only 6 places where links are created with DL_FLAG_PM_RUNTIME flag, I see no
>>> problem to add DL_FLAG_RPM_ACTIVE there to keep current behavior after a fix
>>> in runtime PM core. The description of DL_FLAG_RPM_ACTIVE should also be
>>> a bit
>>> updated, because I initially thought that it means that the runtime pm
>>> counter
>>> on supplier is increased for the whole lifetime of the device link (it
>>> is not
>>> clear when core will call a corresponding pm_runtime_put()).
>>>
>>> The other question is what need to be fixed to get proper behavior
>>> without the
>>> additional suspend/resume cycle mentioned a few paragraphs above.
>> As stated already, if the driver passes DL_FLAG_RPM_ACTIVE to
>> device_link_add() at probe time, then the initial RPM status of the
>> device being probed is expected to be RPM_ACTIVE.
>
> Okay, then this doesn't match the case of Exynos IOMMU and JPEG (and other
> Exynos multimedia drivers), because the links are created from the
> add_device
> platform bus notifier, which is executed just before ->probe() callback of
> the newly created jpeg/multimedia device. That time of course
> jpeg/multimedia
> driver is not able to enable runtime PM of the handled device yet...

But you can run pm_runtime_resume(supplier) directly from there, right?

>>>> ---
>>>> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>>>> Subject: [PATCH v2] PM / core: Fix supplier device runtime PM usage counter imbalance
>>>>
>>>> If a device link is added via device_link_add() by the driver of the
>>>> link's consumer device, the supplier's runtime PM usage counter is
>>>> going to be dropped by the pm_runtime_put_suppliers() call in
>>>> driver_probe_device(). However, in that case it is not incremented
>>>> unless the supplier driver is already present and the link is not
>>>> stateless. That leads to a runtime PM usage counter imbalance for
>>>> the supplier device in a few cases.
>>>>
>>>> To prevent that from happening, bump up the supplier runtime
>>>> PM usage counter in device_link_add() for all links with the
>>>> DL_FLAG_PM_RUNTIME flag set that are added at the consumer probe
>>>> time. Use pm_runtime_get_noresume() for that as the callers of
>>>> device_link_add() who want the supplier to be resumed by it are
>>>> expected to pass DL_FLAG_RPM_ACTIVE in flags to it anyway, but
>>>> additionally resume the supplier if the link is added during
>>>> consumer driver probe to retain the existing behavior for the
>>>> callers depending on it.
>>>>
>>>> Fixes: 21d5c57b3726 (PM / runtime: Use device links)
>>>> Reported-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
>>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>>> I've tested this version of the patch and it keeps the current behavior for
>>> links created with DL_FLAG_PM_RUNTIME flag. The questions is if we really
>>> want it?
>> I think so.
>>
>> Basically, there are two changes at hand: fixing the behavior for
>> stateless links (and for stateful ones if the supplier driver is not
>> present, but that arguably is a corner case) and the behavior change
>> for stateful links (with supplier drivers present).
>>
>> Arguably, the former is more important than the latter and I'd like to
>> be able to push that fix into -stable without dependencies. The
>> latter can be done when all of the current callers depending on the
>> existing behavior have been adjusted.
>>
>> So, I'm going to add a Tested-by from you to this patch, if you don't
>> mind, and queue it up.
>
> Okay, fine for me.

Thanks!