Re: [linux-pm] Fundamental flaw in system suspend, exposed by freezerremoval

From: Alan Stern
Date: Tue Feb 26 2008 - 10:49:32 EST


On Tue, 26 Feb 2008, Rafael J. Wysocki wrote:

> > > IMO the device driver should assure that no new children will be registered
> > > concurrently with the ->suspend() method (IOW, ->suspend() should wait for
> > > all such registrations to complete and should prevent any new ones from
> > > being started) and it should make it impossible to register any new children
> > > after ->suspend() has run. It's the driver's problem how to achieve that.
> >
> > Exactly; this has to be added to the PM documentation.
>
> Into Documentation/power/devices.txt, I gather?

Yes.

> > > > The PM core could help detect errors here. If it tries to suspend a
> > > > device and sees that the device's parent is already suspended, then the
> > > > parent's driver has a bug.
> > >
> > > Yes, I think we ought to fail the suspend in such cases. Still, that's not
> > > sufficient to prevent a child from being registered after we've run
> > > dpm_suspend(). For this reason, we could also leave dpm_suspend() with
> > > dpm_list_mtx held and not release it until the next dpm_resume() is run.
> >
> > The pm_sleep_rwsem will do a better job of catching such errors.
>
> But we should not leave a window between releasing dpm_list_mtx and taking
> pm_sleep_rwsem. Either that, or we should make sure that dpm_active is
> empty after acquiring pm_sleep_rwsem.

I've got some ideas on how to implement this.

We can add a new field "suspend_called" to dev->power. It would be
owned by the PM core (protect by dpm_list_mtx) and read-only to
drivers. Normally it will contain 0, but when the suspend method is
running we set it to SUSPEND_RUNNING and when the method returns
successfully we set it to SUSPEND_DONE. Before calling the resume
method we set it back to 0. Drivers can use this field as an easy way
of checking that all the child devices have been suspended.

When a new device is registered we check its parent's suspend_called
value. If it is SUSPEND_DONE then the caller has a bug and we have to
fail the registration. If it is SUSPEND_RUNNING then the registration
is legal, but we remember what happened. Then when the
currently-running suspend method returns and we reacquire the
dpm_list_mtx, we will realize that a race was lost. If the method
completed successfully (which it shouldn't) we can resume that device
immediately without ever taking it off the dpm_active list; but either
way we should continue the suspend loop. Now the new child will be at
the end of the dpm_active_list, so it will be suspended before the
parent is reached again.

This way we can recover from drivers that are willing to suspend their
device even though there are unsuspended children. The only drawback
will be that for a short time the child will be active while its parent
is suspended.

We should not abort the entire sleep transition simply because we lost
a race. With this scheme we won't even need the pm_sleep_rwsem; the
dpm_list_mtx will provide all the necessary protection.

This is more intricate than it should be. It would have been better to
have had "disable_new_children" and "enable_new_children" methods from
the beginning; then there wouldn't be any races at all. That's life...

The one tricky thing to watch out for is when a suspend or resume
method wants to unregister the device being suspended or resumed. Even
that should be doable (set suspend_called to UNREGISTERED or something
like that).

> > > That will potentially cause some trouble to CPU hotplug cotifiers, but we can
> > > handle that, for example, by using the in_suspend_context() test.
> >
> > Do they need to register new CPUs at some point? There ought to be a
> > way to handle that.
>
> No, they don't, but there are some CPU-related device objects that get
> uregistered/registered. Still, all of this work is really redundant if the CPU
> in question comes back up during the resume, so it should be avoided in
> general. The CPU hotplug notifiers should only unregister those objects if
> the CPU hasn't gone on line during the resume and they have all information
> necessary for discovering that.

Unregistration should always be allowed, and registration should be
allowed whenever the parent isn't suspended. For devices with no
parent, we can imagine there is a fictitious parent at the root of the
device tree. Conceptually it gets suspended after every real device
and resumed before. Maybe even before dpm_power_up(), meaning that
devices with no parent could be registered by a resume_early method.

When your lock-removal stuff gets into Greg's tree, I'll write all
this. Sound good?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/