Problems with get_driver() and driver_attach() (and new_id too)
From: Alan Stern
Date: Thu Jan 05 2012 - 11:31:04 EST
Greg and Kay:
There are some nasty problems connected with the driver core's
get_driver(), put_driver(), and driver_attach(). Not just
implementation bugs, but deeper conceptual difficulties.
Let's start with get_driver(). Its comment says that it increments the
driver's refcount, just like get_device() and a lot of other utility
routines.
But a struct driver is _not_ like a struct device! It resembles a
piece of code more than a piece of data -- it acts as an encapsulation
of a driver. Incrementing its refcount doesn't have much meaning
because a driver's lifetime isn't determined by the structure's
refcount; it's determined by when the driver's module gets unloaded.
What really matters for a driver is whether or not it is registered.
Drivers expect, for example, that none of their methods will be called
after driver_unregister() returns. It doesn't matter if some other
thread still holds a reference to the driver structure; that reference
mustn't be used for accessing the driver code after unregistration.
And of course, driver_attach() does access the driver code, by calling
the probe routine.
An example where this is violated occurs in the usb-serial core. Each
serial driver module registers (at least) two driver structures, one on
the usb_serial_bus and one on the usb_bus. The usb_serial_driver
structure contains a pointer to the usb_driver structure, and this
pointer is passed to get_driver() when the serial driver's new_id sysfs
attribute is written to.
Now, udev scripts are capable of writing to sysfs attributes very soon
after the attribute is created. In the case of USB serial drivers, we
have a bug report of a situation where this write took place after the
usb_serial_driver was registered but before the usb_driver was
registered. Thus, get_driver() was handed a pointer to a driver
structure that had not even been initialized, let alone registered, and
so naturally it crashed.
Almost as bad is what can happen when a driver is unregistered while
some thread is holding a reference obtained from get_driver(). The
reference prevents the driver structure from being freed, but it
doesn't prevent the thread from calling driver_attach() after the
unregistration is complete, at which time the driver code does not
expect to be invoked.
To fix these problems, we need to change the semantics of get_driver()
and put_driver(). Instead of taking a reference to the driver
structure, get_driver() should check whether the driver is currently
registered. If not, return NULL; otherwise, pin the driver (i.e.,
block it from being unregistered) until put_driver() is called.
This will require some code auditing, because there are places where
get_driver() is called without checking the return value (see
drivers/pci/pci_driver.c:pci_add_dynid() for an example; there are
others). It should be marked __must_check.
Also, there are places that call driver_attach() without first calling
get_driver() (see drivers/input/gameport/gameport.c,
drivers/input/serio/serio.c, and drivers/char/agp/amd64-agp.c). They
may or may not be safe; I don't know.
One more thing. The new_id sysfs attribute can cause problems of its
own. Writes to it cause a dynamic ID structure to be allocated, and
these structures will leak unless they are properly deallocated.
Normally they are freed when the driver is unregistered. But what if
registration fails to begin with? It might fail at a point after the
new_id attribute was created, which means the attribute could have been
written to. The dynamic IDs need to be freed after registration fails,
but nobody does this currently.
Alan Stern
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/