Re: [Update][RFC/RFT][PATCH v3 2/5] driver core: Functional dependencies tracking support
From: Rafael J. Wysocki
Date: Fri Sep 23 2016 - 09:36:20 EST
On Tuesday, September 20, 2016 12:46:30 AM Lukas Wunner wrote:
> On Fri, Sep 16, 2016 at 02:33:55PM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >
> > Currently, there is a problem with taking functional dependencies
> > between into account.
> ^
> devices
>
> > What I mean by a "functional dependency" is when the driver of device
> > B needs device A to be functional and (generally) its driver to be
> > present in order to work properly. This has certain consequences
> > for power management (suspend/resume and runtime PM ordering) and
> > shutdown ordering of these devices. In general, it also implies that
> > the driver of A needs to be working for B to be probed successfully
> > and it cannot be unbound from the device before the B's driver.
> ^^
> Trailing whitespace.
>
> > Support for representing those functional dependencies between
> > devices is added here to allow the driver core to track them and act
> > on them in certain cases where applicable.
> >
> > The argument for doing that in the driver core is that there are
> > quite a few distinct use cases involving device dependencies, they
> > are relatively hard to get right in a driver (if one wants to
> > address all of them properly) and it only gets worse if multiplied
> > by the number of drivers potentially needing to do it. Morever, at
> > least one case (asynchronous system suspend/resume) cannot be handled
> > in a single driver at all, because it requires the driver of A to
> > wait for B to suspend (during system suspend) and the driver of B to
> > wait for A to resume (during system resume).
> >
> > For this reason, represent dependencies between devices as "links",
> > with the help of struct device_link objects each containing pointers
> > to the "linked" devices, a list node for each of them, status
> > information, flags, a lock and an RCU head for synchronization.
> >
> > Also add two new list heads, links_to_consumers and links_to_suppliers,
> > to struct device to represent the lists of links to the devices that
> > depend on the given one (consumers) and to the devices depended on
> > by it (suppliers), respectively.
> >
> > The entire data structure consisting of all of the lists of link
> > objects for all devices is protected by SRCU (for list walking)
> > and a by mutex (for link object addition/removal). In addition
> ^^^^
> by a
>
> > to that, each link object has an internal status field whose
> > value reflects what's happening to the devices pointed to by
> > the link. That status field is protected by an internal spinlock.
>
> More precisely, the status field tracks the driver boundness of
> the two devices comprising the link. ("what's happening to the
> devices" is a bit broad, this is really about the *drivers*.)
Basically OK, but it will be more than that most likely due to the bug
reported by Marek.
And thanks for the typo fixes. :-)
> > New links are added by calling device_link_add() which takes four
> > arguments: pointers to the devices in question, the initial status
> > of the link and flags. In particular, if DEVICE_LINK_STATELESS is
> > set in the flags, the link status is not to be taken into account
> > for this link and the driver core will not manage it. In turn, if
> > DEVICE_LINK_AUTOREMOVE is set in the flags, the driver core will
> > remove the link automatically when the consumer device driver
> > unbinds from it.
> >
> > One of the actions carried out by device_link_add() is to reorder
> > the lists used for device shutdown and system suspend/resume to
> > put the consumer device along with all of its children and all of
> > its consumers (and so on, recursively) to the ends of those list
> ^
> s
>
> > in order to ensure the right ordering between all of the supplier
> > and consumer devices.
> >
> > For this reason, it is not possible to create a link between two
> > devices if the would-be supplier device already depends on the
> > would-be consumer device as either a direct descendant of it or a
> > consumer of one of its direct descendants or one of its consumers
> > and so on.
> >
> > It also is impossible to create a link between a parent and a child
> > device (in any direction).
> >
> > There are two types of link objects, persistent and non-persistent.
> > The persistent ones stay around until one of the target devices is
> > deleted, while the non-persistent ones are removed automatically when
> > the consumer driver unbinds from its device (ie. they are assumed to
> > be valid only as long as the consumer device has a driver bound to
> > it). Persistent links are created by default and non-persistent
> > links are created when the DEVICE_LINK_AUTOREMOVE flag is passed
> > to device_link_add().
> >
> > Both persistent and non-persistent device links can be deleted
> > explicitly with the help of device_link_del().
> >
> > Links created without the DEVICE_LINK_STATELESS flag set are managed
> > by the driver core using a simple state machine. There are 5 states
> > each link can be in: DORMANT (unused), AVAILABLE (the supplier driver
> > is present and functional), CONSUMER_PROBE (the consumer driver is
> > probing), ACTIVE (both supplier and consumer drivers are present and
> > functional), and SUPPLIER_UNBIND (the supplier driver is unbinding).
> > The driver core updates the link state automatically depending on
> > what happens to the linked devices and for each link state specific
> > actions are taken in addition to that.
> >
> > For example, if the supplier driver unbinds from its device, the
> > driver core will also unbind the drivers of all of its consumers
> > automatically under the assumption that they cannot function
> > properly without the supplier. Analogously, the driver core will
> > only allow the consumer driver to bind to its device is the
> ^^
> if
>
> > supplier driver is present and functional (ie. the link is in
> > the AVAILABLE state). If that's not the case, it will rely on
> > the existing deferred probing mechanism to wait for the supplier
> > driver to become available.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> > ---
> [snip]
> > +/**
> > + * device_link_add - Create a link between two devices.
> > + * @consumer: Consumer end of the link.
> > + * @supplier: Supplier end of the link.
> > + * @status: The initial status of the link.
> > + * @flags: Link flags.
> > + *
> > + * If the DEVICE_LINK_STATELESS flag is set, @status is ignored. Otherwise,
> > + * the caller is responsible for ensuring that @status reflects the current
> > + * status of both @consumer and @supplier.
> > + *
> > + * If the DEVICE_LINK_AUTOREMOVE is set, the link will be removed automatically
> > + * when the consumer device driver unbinds from it. The combination of both
> > + * DEVICE_LINK_AUTOREMOVE and DEVICE_LINK_STATELESS set is invalid and will
> > + * cause NULL to be returned.
> > + *
> > + * A side effect of the link creation is re-ordering of dpm_list and the
> > + * devices_kset list by moving the consumer device and all devices depending
> > + * on it to the ends of these lists.
> > + */
> > +struct device_link *device_link_add(struct device *consumer,
> > + struct device *supplier,
> > + enum device_link_status status, u32 flags)
> > +{
> > + struct device_link *link;
> > +
> > + if (!consumer || !supplier || supplier == consumer->parent ||
>
> So a link from the child to the parent is forbidden, but e.g. to the
> grandparent is not. Either this should be forbidden for all ancestors
> of the child or none. I'd vote for the latter. I don't see a reason
> why this shouldn't be allowed. It doesn't introduce a dependency loop
> and it might be useful if someone needs a driver presence dependency
> between parent and child.
While I can remove this check, I don't see why anyone would use links for
that instead of adding a device flag to cause the core to wait for the
parent to be probed if the child requires that.
> [snip]
> > +static int device_links_read_lock(void)
> > +{
> > + return srcu_read_lock(&device_links_srcu);
> > +}
> > +
> > +static void device_links_read_unlock(int idx)
> > +{
> > + return srcu_read_unlock(&device_links_srcu, idx);
> > +}
>
> How about declaring the above two functions inline?
Well, does it really matter? The complier should be able to make them inline
anyway if it sees the point.
> [snip]
> > +/**
> > + * device_links_check_suppliers - Check supplier devices for this one.
>
> The short description is mostly a repetition of the function name
> and thus not very informative.
>
> Imagine someone coming here from driver_probe_device(), where this
> function is invoked. They should immediately get an idea what
> the function does.
>
> How about: "Check presence of supplier drivers"
Sounds good.
> > + * @dev: Consumer device.
> > + *
> > + * Check links from this device to any suppliers. Walk the list of the device's
> > + * consumer links and see if all of the suppliers are available. If not, simply
> ^^^^^^^^
> "supplier links and see if all if them are available."
>
> [snip]
> > +/**
> > + * device_links_no_driver - Update links of a device without a driver.
> > + * @dev: Device without a drvier.
> > + *
> > + * Delete all non-persistent links from this device to any suppliers.
> > + *
> > + * Persistent links stay around, but their status is changed to "available",
> > + * unless they already are in the "supplier unbind in progress" state in which
> > + * case they need not be updated.
>
> Kerneldoc refers to persistent links, a term which no longer exists in v3.
It does exist.
The flag is not present any more, but the links may still be persistent
(and the changelog talks about "persistent" and "non-persistent" links too).
> [snip]
> > +void device_links_no_driver(struct device *dev)
> > +{
> > + struct device_link *link, *ln;
> > +
> > + mutex_lock(&device_links_lock);
> > +
> > + list_for_each_entry_safe_reverse(link, ln, &dev->links_to_suppliers, c_node) {
> > + if (link->flags & DEVICE_LINK_STATELESS)
> > + continue;
> > +
> > + if (link->flags & DEVICE_LINK_AUTOREMOVE) {
> > + __device_link_del(link);
>
> The link will be autoremoved not only when the consumer unbinds,
> but also when probing the consumer fails.
>
> Looks like a bug.
It really was intentional, because the use case I see for AUTOREMOVE (and
the only one to be honest) is when the link is created by the consumer
probe in which case it wants to avoid worrying about the cleanup part.
Which also is applicable to the cleanup when the probe fails IMO.
> Without the autoremove functionality, this function shrinks to just a
> few LoC and it's probably okay to just duplicate that code and copy it
> into device_links_driver_gone(). Then you'd have two separate
> functions, one for the error path in really_probe() and the other for
> __device_release_driver().
>
> And you could then also give the functions names that match where
> they're called from, e.g. device_links_probe_failed() and
> device_links_driver_unbound(). Because the existing names
> device_links_driver_gone() and device_links_no_driver() are very
> similar and thus a bit confusing.
I can drop AUTOREMOVE entirely if it is problematic.
> [snip]
> > +void device_links_unbind_consumers(struct device *dev)
> > +{
> > + struct device_link *link;
> > + int idx;
> > +
> > + start:
> > + idx = device_links_read_lock();
> > +
> > + list_for_each_entry_rcu(link, &dev->links_to_consumers, s_node) {
> > + enum device_link_status status;
> > +
> > + if (link->flags & DEVICE_LINK_STATELESS)
> > + continue;
> > +
> > + spin_lock(&link->lock);
> > + status = link->status;
> > + if (status == DEVICE_LINK_CONSUMER_PROBE) {
> > + spin_unlock(&link->lock);
> > +
> > + device_links_read_unlock(idx);
> > +
> > + wait_for_device_probe();
> > + goto start;
> > + }
> > + link->status = DEVICE_LINK_SUPPLIER_UNBIND;
> > + if (status == DEVICE_LINK_ACTIVE) {
> > + struct device *consumer = link->consumer;
> > +
> > + get_device(consumer);
>
> As long as the struct device_link exists, a ref is held on the
> supplier and consumer. Why acquire another ref here?
This comment has just been withdrawn. :-)
> > + spin_unlock(&link->lock);
>
> The lock is released both at the beginning of this if-block and
> immediately after the if-block (in case the if-condition is false).
> Why not simply release the lock *before* the if-block?
Because the get_device() needs to be done under the lock.
>
> [snip]
> > @@ -1233,6 +1680,7 @@ void device_del(struct device *dev)
> > {
> > struct device *parent = dev->parent;
> > struct class_interface *class_intf;
> > + struct device_link *link, *ln;
> >
> > /* Notify clients of device removal. This call must come
> > * before dpm_sysfs_remove().
> > @@ -1240,6 +1688,30 @@ void device_del(struct device *dev)
> > if (dev->bus)
> > blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
> > BUS_NOTIFY_DEL_DEVICE, dev);
> > +
> > + /*
> > + * Delete all of the remaining links from this device to any other
> > + * devices (either consumers or suppliers).
> > + *
> > + * This requires that all links be dormant, so warn if that's no the
> > + * case.
> > + */
>
> How about moving this to a separate function, e.g. device_links_purge(dev)?
>
> In all other cases you've created a new function even though it's only
> called from a single place, which makes sense because it avoids cluttering
> up these central functions like device_del().
OK
> > + mutex_lock(&device_links_lock);
> > +
> > + list_for_each_entry_safe_reverse(link, ln, &dev->links_to_suppliers, c_node) {
> > + WARN_ON(link->status != DEVICE_LINK_DORMANT &&
> > + !(link->flags & DEVICE_LINK_STATELESS));
> > + __device_link_del(link);
> > + }
>
> Shouldn't it also be legal for the supplier links to be in
> DEVICE_LINK_AVAILABLE state upon removal of a consumer device?
>
> (And perhaps also DEVICE_LINK_SUPPLIER_UNBIND?)
>
> Looks like a bug.
But this is done after removing the supplier driver, so the state should be
DORMANT (unless the link is stateless), shouldn't it?
> [snip]
> > --- linux-pm.orig/drivers/base/dd.c
> > +++ linux-pm/drivers/base/dd.c
> > @@ -249,6 +249,7 @@ static void driver_bound(struct device *
> > __func__, dev_name(dev));
> >
> > klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
> > + device_links_driver_bound(dev);
> >
> > device_pm_check_callbacks(dev);
> >
> > @@ -399,6 +400,7 @@ probe_failed:
> > blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
> > BUS_NOTIFY_DRIVER_NOT_BOUND, dev);
> > pinctrl_bind_failed:
> > + device_links_no_driver(dev);
> > devres_release_all(dev);
> > driver_sysfs_remove(dev);
> > dev->driver = NULL;
> > @@ -489,6 +491,10 @@ int driver_probe_device(struct device_dr
> > if (!device_is_registered(dev))
> > return -ENODEV;
> >
> > + ret = device_links_check_suppliers(dev);
> > + if (ret)
> > + return ret;
> > +
>
> I think clarity would improve if you would move the call to
> device_links_check_suppliers() down the call stack into really_probe().
> Then it would be in the same place as the call to device_links_no_driver()
> (if probing fails).
I'll have a look.
> Furthermore, driver_probe_device() runtime resumes the parent before
> probing a device, but you're not doing the same for the suppliers.
> Looks like a bug.
I need to check, but for this patch it doesn't matter anyway as links don't
apply to runtime PM at this point yet.
> [snip]
> > --- linux-pm.orig/include/linux/device.h
> > +++ linux-pm/include/linux/device.h
> > @@ -706,6 +706,35 @@ struct device_dma_parameters {
> > unsigned long segment_boundary_mask;
> > };
> >
> > +enum device_link_status {
> > + DEVICE_LINK_NO_STATE = -1,
>
> How about a comment like /* Stateless. */ so that the relationship to
> the DEVICE_LINK_STATELESS flag is clear.
>
> > + DEVICE_LINK_DORMANT = 0, /* Link not in use. */
> > + DEVICE_LINK_AVAILABLE, /* Supplier driver is present. */
> > + DEVICE_LINK_ACTIVE, /* Consumer driver is present too. */
> > + DEVICE_LINK_CONSUMER_PROBE, /* Consumer is probing. */
> > + DEVICE_LINK_SUPPLIER_UNBIND, /* Supplier is unbinding. */
> > +};
> > +
> > +/*
> > + * Device link flags.
> > + *
> > + * STATELESS: The state machine is not applicable to this link.
>
> IMO the consequence of setting this flag is not immediately clear from the
> comment. How about: "Ignore driver presence."
Well, I'm not sure if that's any better to be honest. The consequences would
still be unclear with it, but for a different reason. :-)
> > + * AUTOREMOVE: Remove this link automatically on cunsumer driver unbind.
Thanks,
Rafael