Re: [PATCH] driver core: Ensure proper suspend/resume ordering
From: Grygorii Strashko
Date: Wed Sep 16 2015 - 09:40:17 EST
Hi Thierry, Alan, Rafael,
On 09/12/2015 01:28 AM, Rafael J. Wysocki wrote:
> On Friday, September 11, 2015 03:01:14 PM Alan Stern wrote:
>> On Fri, 11 Sep 2015, Thierry Reding wrote:
>>
>>> On Fri, Sep 11, 2015 at 12:08:02AM +0200, Rafael J. Wysocki wrote:
>>>> On Thursday, September 10, 2015 12:19:03 PM Thierry Reding wrote:
>>>>> From: Thierry Reding <treding@xxxxxxxxxx>
>>>>>
>>>>> Deferred probe can lead to strange situations where a device that is a
>>>>> dependency for others will be moved to the end of the dpm_list. At the
>>>>> same time the dependers may not be moved because at the time they will
>>>>> be probed the dependee may already have been successfully reprobed and
>>>>> they will not have to defer the probe themselves.
>>>>
>>>> So there's a bug in the implementation of deferred probing IMO.
>>>
>>> Well, yeah. The root problem here is that we don't have dependency
>>> information and deferred probing is supposed to fix that. It does so
>>> fairly well, but it breaks in this particular case.
Actually this is the old problem and deferred probing just makes it more
reproducible.
Lets say, we have device P-producer and device C-consumer.
- devices are created/registered in the following order (It doesn't matter
how they are created - dt, platform,..)
-- device C created
-- device P created
- devices probing order (legacy probing order depends on Makefiles and
initcalls - no deferred probing)
-- device P probed
-- device C probed
- suspend order
-- device P suspended
-- device C suspended (ops, device C may still need device P)
To W/A such issues:
- driver's suspend code can be shifted between suspend layers (from
.suspend() to .suspend_noirq(), for example)
- device's registration order can be changed manually in DT or platform code
- can play with initcalls
May be it was enough before, but now it's not :(, simply because, now
there are much more generic/common frameworks in Kernel:
gpio, regulators, dma ++ clk, syscon, phy, extcon, component, display framework..
As result, It introduces more devices and dependencies between devices and,
therefore, suspend ordering issue can be hit more often.
>>>
>>>>> One example where this happens is the Jetson TK1 board (Tegra124). The
>>>>> gpio-keys driver exposes the power key of the board as an input device
>>>>> that can also be used as a wakeup source. Commit 17cdddf0fb68 ("ARM:
>>>>> tegra: Add gpio-ranges property") results in the gpio-tegra driver
>>>>> deferring probe because one of its dependencies, the pinctrl-tegra
>>>>> driver, has not successfully completed probing. Currently the deferred
>>>>> probe code will move the corresponding gpio-tegra device to the end of
>>>>> the dpm_list, but by the time the gpio-keys device, depending on the
>>>>> gpio-tegra device, is probed, gpio-tegra has already been reprobed, so
>>>>> the gpio-keys device is not moved to the end of dpm_list itself. As a
>>>>> result, the suspend ordering becomes pinctrl-tegra -> gpio-keys ->
>>>>> gpio-tegra. That's problematic because the gpio-keys driver requests
>>>>> the power key to be a wakeup source. However, the programming of the
>>>>> wakeup interrupt registers happens in the gpio-tegra driver's suspend
>>>>> callback, which is now called before that of the gpio-keys driver. The
>>>>> result is that the wrong values are programmed and leaves the system
>>>>> unable to be resumed using the power key.
>>>>>
>>>>> To fix this situation, always move devices to the end of the dpm_list
>>>>> before probing them. Technically this should only be done for devices
>>>>> that have been successfully probed, but that won't work for recursive
>>>>> probing of devices (think an I2C master that instantiates children in
>>>>> its ->probe()). Effectively the dpm_list will end up ordered the same
>>>>> way that devices were probed, hence taking care of dependencies.
>>
>> I'm not worried about the overhead involved in moving a device to the
>> end of the list every time it is probed. That ought to be relatively
>> small.
>>
>> There are a few things to watch out for. Since the dpm_list gets
>> modified during system sleep transitions, we would have to make sure
>> that nothing gets probed during those times. In principle, that's what
>> the "prepare" stage is meant for, but there's still a race. As long as
>> no other kernel thread (such as the deferred probing mechanism) tries
>> to probe a device once everything has been frozen, we should be okay.
>> But if not, there will be trouble -- after the ->prepare callback runs,
>> the device is no longer on the dpm_list and so we don't want this patch
>> to put it back on that list.
>
I think, It should prohibited to probe devices during suspend/hibernation.
And solution introduced in this patch might help to fix it -
in general, we could do :
- add sync point on suspend enter: wait_for_device_probe() and
- prohibit probing: move all devices which will request probing into
deferred_probe list
- one suspend exit: allow probing and do driver_deferred_probe_trigger
>
>> There's also an issue about other types of dependencies. For instance,
>> it's conceivable that device B might be discovered and depend on device
>> A, even before A has been bound to a driver. (B might be discovered by
>> A's subsystem rather than A's driver.) In that case, moving A to the
>> end of the list would cause B to come before A even though B depends on
>> A. Of course, deferred probing already has this problem.
>
> That may actually happen for PCIe ports and devices below them AFAICS.
>
> Devices below PCIe ports are discovered by the PCI subsystem and the PCIe
> ports need not be probed before those devices are probed.
I'd like to mention here that this patch will work only
if dmp_list will be filled according device creation order ("parent<-child" dependencies)
*AND* according device's probing order ("supplier<-consumer").
So, if there is the case when Parent device can be probed AFTER its children
- it will not work, because "parent<-child" dependencies will not be tracked
any more :( Sry, I could not even imagine that such crazy case exist :'(
Are there any other subsystems with the same behavior like PCI?
If not - probably, it could be fixed in PCI subsystem using device_pm_move_after() or
device_move() in PCIe ports probe.
if yes - ... maybe we can scan/re-check and reorder dpm_list on suspend enter and
restore ("parent<-child" dependencies).
>
>> An easy way to check for this sort of thing would be to verify that a
>> device about to be probed doesn't have any children. This wouldn't
>> catch all the potential dependencies, but it would be a reasonable
>> start.
>
> That would address the PCIe ports issue.
>
that might work also.
Truth is that smth. need to be done 100%. Personally, I was hit by this issue also,
and it cost me 3 hours of debugging and I came up with the same patch as
Bill Huang, then spent some time trying to understand what is wrong with PCI
- finally, I've just changed the order of my devices in DT :)
Also, I think, it will be good to have this patch in -next to collect more feedbacks.
--
regards,
-grygorii
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/