Re: Problems with get_driver() and driver_attach() (and new_id too)
From: Dmitry Torokhov
Date: Thu Jan 05 2012 - 13:01:32 EST
Hi Alan,
On Thu, Jan 05, 2012 at 11:31:00AM -0500, Alan Stern wrote:
> Greg and Kay:
>
> There are some nasty problems connected with the driver core's
> get_driver(), put_driver(), and driver_attach(). Not just
> implementation bugs, but deeper conceptual difficulties.
>
> Let's start with get_driver(). Its comment says that it increments the
> driver's refcount, just like get_device() and a lot of other utility
> routines.
>
> But a struct driver is _not_ like a struct device! It resembles a
> piece of code more than a piece of data -- it acts as an encapsulation
> of a driver. Incrementing its refcount doesn't have much meaning
> because a driver's lifetime isn't determined by the structure's
> refcount; it's determined by when the driver's module gets unloaded.
>
> What really matters for a driver is whether or not it is registered.
> Drivers expect, for example, that none of their methods will be called
> after driver_unregister() returns. It doesn't matter if some other
> thread still holds a reference to the driver structure; that reference
> mustn't be used for accessing the driver code after unregistration.
> And of course, driver_attach() does access the driver code, by calling
> the probe routine.
Agree here.
>
> An example where this is violated occurs in the usb-serial core. Each
> serial driver module registers (at least) two driver structures, one on
> the usb_serial_bus and one on the usb_bus. The usb_serial_driver
> structure contains a pointer to the usb_driver structure, and this
> pointer is passed to get_driver() when the serial driver's new_id sysfs
> attribute is written to.
>
> Now, udev scripts are capable of writing to sysfs attributes very soon
> after the attribute is created. In the case of USB serial drivers, we
> have a bug report of a situation where this write took place after the
> usb_serial_driver was registered but before the usb_driver was
> registered. Thus, get_driver() was handed a pointer to a driver
> structure that had not even been initialized, let alone registered, and
> so naturally it crashed.
>
> Almost as bad is what can happen when a driver is unregistered while
> some thread is holding a reference obtained from get_driver(). The
> reference prevents the driver structure from being freed, but it
> doesn't prevent the thread from calling driver_attach() after the
> unregistration is complete, at which time the driver code does not
> expect to be invoked.
>
> To fix these problems, we need to change the semantics of get_driver()
> and put_driver(). Instead of taking a reference to the driver
> structure, get_driver() should check whether the driver is currently
> registered. If not, return NULL; otherwise, pin the driver (i.e.,
> block it from being unregistered) until put_driver() is called.
Or maybe we should just drop get_driver() and put_driver() and just make
sure that driver_attach() does not race with driver_unregister()?
I think pinning driver so that it can't be unregistered (and
consequently module unload hangs) its a mis-feature.
>
> This will require some code auditing, because there are places where
> get_driver() is called without checking the return value (see
> drivers/pci/pci_driver.c:pci_add_dynid() for an example; there are
> others). It should be marked __must_check.
>
> Also, there are places that call driver_attach() without first calling
> get_driver() (see drivers/input/gameport/gameport.c,
> drivers/input/serio/serio.c, and drivers/char/agp/amd64-agp.c). They
> may or may not be safe; I don't know.
Serio and gameport are safe as everyting is protected by serio_mutex so
it is not possible to yank the driver our while we are trying to attach
it to a device.
>
> One more thing. The new_id sysfs attribute can cause problems of its
> own. Writes to it cause a dynamic ID structure to be allocated, and
> these structures will leak unless they are properly deallocated.
> Normally they are freed when the driver is unregistered. But what if
> registration fails to begin with? It might fail at a point after the
> new_id attribute was created, which means the attribute could have been
> written to. The dynamic IDs need to be freed after registration fails,
> but nobody does this currently.
>
Don't we create corresponding sysfs attributes only after driver
successfully registered? And attributes are the only way to add (and
thus allocate) new ids so I do not see why we'd be leaking here.
Thanks.
--
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/