[bug] hung bootup in various drivers, was: "2.6.21-rc5: known regressions"
From: Ingo Molnar
Date: Fri Mar 30 2007 - 08:05:17 EST
i just found a new category of driver regressions in 2.6.21, doing
allyesconfig bzImage bootup tests: the init methods of various drivers
hangs in driver_unregister().
It is caused by this problem: the semantics of driver_unregister() [also
implicitly called in pci_driver_unregister()] has apparently changed
recently. If a driver does:
pci_register_driver(&my_driver);
...
if (some_failure) {
pci_unregister_driver(&my_driver);
...
}
it will hang the bootup in the following piece of code:
drivers/base/driver.c:
void driver_unregister(struct device_driver * drv)
{
bus_remove_driver(drv);
wait_for_completion(&drv->unloaded);
the completion is never done - because nobody removes the bus while the
init is still happening, obviously. (and bootup is serialized anyway)
now, the majority of drivers does the driver unregistry from its
module-cleanup function, so it's not affected by this problem. But if
you apply the debug patch attached further below, and do an allyesconfig
bzImage bootup, there's 3 hits already:
BUG: at drivers/base/driver.c:187 driver_unregister()
[<c0105ff9>] show_trace_log_lvl+0x19/0x2e
[<c01063e2>] show_trace+0x12/0x14
[<c01063f8>] dump_stack+0x14/0x16
[<c063f7e6>] driver_unregister+0x3d/0x43
[<c0488048>] pci_unregister_driver+0x10/0x5f
[<c1b5f7c7>] slgt_init+0x9b/0x1ca
[<c1b31a2d>] init+0x15d/0x2bd
[<c0105bc3>] kernel_thread_helper+0x7/0x10
BUG: at drivers/base/driver.c:187 driver_unregister()
[<c0105ff9>] show_trace_log_lvl+0x19/0x2e
[<c01063e2>] show_trace+0x12/0x14
[<c01063f8>] dump_stack+0x14/0x16
[<c063f7e6>] driver_unregister+0x3d/0x43
[<c0488048>] pci_unregister_driver+0x10/0x5f
[<c0619505>] init_ipmi_si+0x70a/0x738
[<c1b31a2d>] init+0x15d/0x2bd
[<c0105bc3>] kernel_thread_helper+0x7/0x10
BUG: at drivers/base/driver.c:187 driver_unregister()
[<c0105ff9>] show_trace_log_lvl+0x19/0x2e
[<c01063e2>] show_trace+0x12/0x14
[<c01063f8>] dump_stack+0x14/0x16
[<c063f7e6>] driver_unregister+0x3d/0x43
[<c0488048>] pci_unregister_driver+0x10/0x5f
[<c1b6d2d8>] tlan_probe+0x2dd/0x30e
[<c1b31a2d>] init+0x15d/0x2bd
[<c0105bc3>] kernel_thread_helper+0x7/0x10
possibly more could trigger. Each of these 3 places caused an actual
bootup hang on my testbox, so these are real regressions and need to be
fixed.
because there are a good number of drivers that do
pci_unregister_device() from their init function, and because i cannot
see anything obviously wrong in doing an unregister call after a
failure, i think it's driver_unregister() that needs to be fixed. Greg,
what do you think?
Ingo
Index: linux/drivers/base/driver.c
===================================================================
--- linux.orig/drivers/base/driver.c
+++ linux/drivers/base/driver.c
@@ -183,7 +183,8 @@ int driver_register(struct device_driver
void driver_unregister(struct device_driver * drv)
{
bus_remove_driver(drv);
- wait_for_completion(&drv->unloaded);
+ if (!drv->unloaded.done)
+ WARN_ON(1);
}
/**
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/