Re: device struct bloat

From: Peter Zijlstra
Date: Tue Nov 06 2007 - 12:19:46 EST


On Tue, 2007-11-06 at 11:32 -0500, Alan Stern wrote:
> On Tue, 6 Nov 2007, Peter Zijlstra wrote:
> > On Tue, 2007-11-06 at 10:36 -0500, Alan Stern wrote:

> > > You can't possibly solve the lockdep problems here with a simple-minded
> > > approach like your DRIVER_NORMAL, DRIVER_PARENT, etc. The device tree
> > > is arbitrarily deep & wide, and there is at least one routine that
> > > acquires the semaphores for _all_ the devices in the tree.
> >
> > *blink* someone needs to take all locks - why?
>
> It's for system suspend. All the devices are locked to prevent driver
> binding and unbinding during the suspend transition.

Just locking the tree root is not enough? (thereby avoiding all any new
modifying operation to descend into the tree).

> Even apart from that, there are places where the USB core needs to
> acquire multiple layers of locks (more than just two). For example, if
> somebody has hubs nested several deep and then unplugs the hub closest
> to the computer, this will happen.

Pin the sub-tree root with a reference, iterate the sub-tree and delete
the leafs one by one?

> Shucks, any time a driver's probe routine tries to register a child
> device you will run into problems. probe() is always called with the
> device locked, and registration of a child will acquire the child's
> lock in order to probe drivers for the child.

Right, so you say you have unbounded stack recursion? How about pushing
each probe into a stack (workqueue perhaps). Then each stack entry would
only do a single level probe, if it would recurse push a new entry on
the stack. Basic recursive -> stack transform.

> > > This fact
> > > alone seems to preclude using lockdep for device locks. (If there was
> > > a form of mutex_lock() that bypassed the lockdep checks, you could use
> > > it and avoid these issues.)
> >
> > I'm sceptical of this, since its a simple tree (as opposed to a balanced
> > tree) a simple lock-coupling approach should be enough to guarantee
> > consistency.
>
> I don't follow your reasoning. Let's say a task wants to lock all the
> direct children of a particular device. In what order should the locks
> be acquired?

I'd first start by asking if you want to lock all the children or the
sub-tree. The latter can be done by locking the root of said sub-tree.

> There's no natural tree-related ordering to follow.

Of course there is a natural deadlock free locking order in a tree: lock
the parent, lock all its children, repeat by noting that each child is a
parent again. If you only ever lock top down, there should not be any
lateral dependencies.

> In the simple case where locks are acquired along a path in the tree,
> you could solve the lockdep issues by passing mutex_lock_nested() a key
> equal to the device's depth in the tree. But that won't work with more
> complicated cases.

And only up to 8, as that's the max nesting depth.

> > > Deadlock is a serious consideration. For the most part, routines
> > > locking devices do so along a single path in the tree. For this simple
> > > case the rule is: Never acquire a parent's lock while holding the
> > > child's lock.
> >
> > Sure, but once you have a parent's lock, you could unlock your
> > grandparent, no? (it having a locked child, your parent, should be
> > enough to guarantee its continued existence)
>
> Of course. So what? In many cases the code needs to keep all three
> locks. The locks don't merely guarantee the device structs' continued
> existence (a kref could do that) -- they serialize a number of related
> operations.

Right, but does the grandparent really need serialisation? Normal simple
tree manipulations don't require more than 2 locks to guarantee tree
consistency. So you either don't have a simple tree, or you have more
requirements.

> > > The routine that locks all the devices acquires the locks in order of
> > > device registration. The idea here is that children are always
> > > registered _after_ their parents, so this should be compatible with the
> > > previous rule. But there is a potential problem: device_move() can
> > > move an older child under a younger parent!
> >
> > Seems like a weird rule, a typical tree locking rule would be to lock
> > them top-down - such a rule can easily cope with moves: lock old parent,
> > lock child, lock new parent, move child, unlock all in reverse order.
>
> Yep. But this isn't a typical tree-locking. System suspend sometimes
> requires that devices be powered down in a specific order, and there
> are constraints not captured by the positions in the tree. We get
> around this by powering devices down in reverse order of detection,
> which should always be safe. Although the locking need not necessarily
> follow these same constraints, it is certainly the easiest approach.

Ah, there are more constraints than tree order (which as I get it
resembles bus order)? Which confuses me, as you cannot detect a leaf
before you have a parent, so top-down should still be good, no?

> (And by the way, your example rule "lock old parent, lock child, lock
> new parent" is subject to deadlocks. What if another task tries to
> move a different device from under the new parent to the old parent at
> the same time?)

D'0h, right you are. How about a delete, insert sequence?

> > Like said, I think the tree locking model should be revisited. A
> > top-down locking model with lock-coupling should, from my ignorant
> > perspective, solve your problems.
>
> I don't understand what you mean by "lock-coupling".

A
B C
D E F G

In order to lock F we do:
lock A, lock C, unlock A, lock F, unlock C

Look at it this way, either you're more complex than the VFS or it can
be annotated :-)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/