Re: [PATCH] driver core: Drop devices_kset_move_last() call from really_probe()

From: Rafael J. Wysocki
Date: Tue Jul 10 2018 - 06:29:31 EST


On Tue, Jul 10, 2018 at 12:06 AM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
> [+cc Kishon]
>
> On Mon, Jul 9, 2018 at 4:35 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
>>
>> On Mon, Jul 9, 2018 at 3:57 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
>> > On Fri, Jul 6, 2018 at 5:01 AM Rafael J. Wysocki <rjw@xxxxxxxxxxxxx> wrote:
>> >>
>> >> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>> >>
>> >> The devices_kset_move_last() call in really_probe() is a mistake
>> >> as it may cause parents to follow children in the devices_kset list
>> >> which then causes system shutdown to fail. Namely, if a device has
>> >> children before really_probe() is called for it (which is not
>> >> uncommon), that call will cause it to be reordered after the children
>> >> in the devices_kset list and the ordering of that list will not
>> >> reflect the correct device shutdown order.
>> >>
>> >> Also it causes the devices_kset list to be constantly reordered
>> >> until all drivers have been probed which is totally pointless
>> >> overhead in the majority of cases.
>> >>
>> >> For that reason, revert the really_probe() modifications made by
>> >> commit 52cdbdd49853.
>> >
>> > I'm sure you've considered this, but I can't figure out whether this
>> > patch will reintroduce the problem that was solved by 52cdbdd49853.
>> > That patch updated two places: (1) really_probe(), the change you're
>> > reverting here, and (2) device_move().
>> >
>> > device_move() is only called from 4-5 places, none of which look
>> > related to the problem fixed by 52cdbdd49853, so it seems like that
>> > problem was probably resolved by the hunk you're reverting.
>>
>> That's right, but I don't want to revert all of it. The other parts
>> of it are kind of useful as they make the handling of the devices_kset
>> list be consistent with the handling of dpm_list.
>>
>> The hunk I'm reverting, however, is completely off. It not only is
>> incorrect (as per the above), but it also causes the devices_kset list
>> and dpm_list to be handled differently.
>
> If I understand correctly, you are saying:
>
> - the 52cdbdd49853 really_probe() hunk fixed a problem, but

It papered over a shutdown failure. Calling it a "fix" is an overstatement IMO.

> - that hunk was the wrong fix for it, and
> - this patch removes the wrong fix (and probably reintroduces the problem)
>
> If devices_kset is supposed to be ordered so children follow parents,
> I agree the really_probe() hunk doesn't make much sense because the
> parent/child relation is determined by the circuit design, not by the
> probe order.

Exactly.

> It just seems like it's worth being clear that we're reintroducing the
> problem fixed by 52cdbdd49853, so it needs to be solved a different
> way.

OK

> Ideally that would be done before this patch so there's not a
> regression, and this changelog could mention what's happening.

Well, commit 52cdbdd49853 introduced a regression by itself, but that
regression has only been reported recently.

I don't really want to go into a discussion on which of the two
regressions is more painful, but then IMO going back to the state from
before commit 52cdbdd49853 is fair enough. Hence the patch.

>> It had attempted to fix something, but it failed miserably at that.
>
> If you're saying that 52cdbdd49853 *tried* to fix a DRA7XX_evm reboot
> problem, but in fact, it did not fix that problem, then I guess there
> should be no issue with reverting that hunk.

Again, it hid the reboot problem by changing the core in a way that
led to a shutdown regression elsewhere.

Also it looks like the platform(s) having that reboot issue do(es)n't
really do system-wide suspend/resume, because that "fix" obviously
doesn't help there.

>> >> Fixes: 52cdbdd49853 (driver core: correct device's shutdown order)
>> >> Link: https://lore.kernel.org/lkml/CAFgQCTt7VfqM=UyCnvNFxrSw8Z6cUtAi3HUwR4_xPAc03SgHjQ@xxxxxxxxxxxxxx/
>> >> Reported-by: Pingfan Liu <kernelfans@xxxxxxxxx>
>> >> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>> >> ---
>> >> drivers/base/dd.c | 8 --------
>> >> 1 file changed, 8 deletions(-)
>> >>
>> >> Index: linux-pm/drivers/base/dd.c
>> >> ===================================================================
>> >> --- linux-pm.orig/drivers/base/dd.c
>> >> +++ linux-pm/drivers/base/dd.c
>> >> @@ -434,14 +434,6 @@ re_probe:
>> >> goto probe_failed;
>> >> }
>> >>
>> >> - /*
>> >> - * Ensure devices are listed in devices_kset in correct order
>> >> - * It's important to move Dev to the end of devices_kset before
>> >> - * calling .probe, because it could be recursive and parent Dev
>> >> - * should always go first
>> >> - */
>> >> - devices_kset_move_last(dev);
>> >> -
>> >> if (dev->bus->probe) {
>> >> ret = dev->bus->probe(dev);
>> >> if (ret)
>> >>