CONFIG_DEBUG_TEST_DRIVER_REMOVE causes unremovable drivers to bind devices twice

From: Laszlo Ersek
Date: Mon Oct 10 2016 - 09:17:41 EST


Hi,

Greg asked me to stick to email with this bug report, so I'm reposting
the original kernel bugzilla report to personal addresses, and lkml.

Thanks,
Laszlo

https://bugzilla.kernel.org/show_bug.cgi?id=177021

Bug ID: 177021
Summary: [driver core] CONFIG_DEBUG_TEST_DRIVER_REMOVE causes
unremovable drivers to bind devices twice
Product: Drivers
Version: 2.5
Kernel Version: v4.8-2283-ga3443cd (4.9.0-0.rc0.git2.1.fc26.aarch64)
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: Other
Assignee: drivers_other@xxxxxxxxxxxxxxxxxxxx
Reporter: lersek@xxxxxxxxxx
CC: arnd@xxxxxxxx, greg@xxxxxxxxx
Regression: No

CONFIG_DEBUG_TEST_DRIVER_REMOVE was added in the following commit:

> commit bea5b158ff0da9c7246ff391f754f5f38e34577a
> Author: Rob Herring <robh@xxxxxxxxxx>
> Date: Thu Aug 11 10:20:58 2016 -0500
>
> driver core: add test of driver remove calls during probe
>
> In recent discussions on ksummit-discuss[1], it was suggested to do a
> sequence of probe, remove, probe for testing driver remove paths. This
> adds a kconfig option for said test.
>
> [1] https://lists.linuxfoundation.org/pipermail/ksummit-discuss/2016-August/003459.html
>
> Suggested-by: Arnd Bergmann <arnd@xxxxxxxx>
> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Rob Herring <robh@xxxxxxxxxx>
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
>
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 98504ec99c7d..fdf44cac08e6 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -212,6 +212,16 @@ config DEBUG_DEVRES
>
> If you are unsure about this, Say N here.
>
> +config DEBUG_TEST_DRIVER_REMOVE
> + bool "Test driver remove calls during probe"
> + depends on DEBUG_KERNEL
> + help
> + Say Y here if you want the Driver core to test driver remove functions
> + by calling probe, remove, probe. This tests the remove path without
> + having to unbind the driver or unload the driver module.
> +
> + If you are unsure about this, say N here.
> +
> config SYS_HYPERVISOR
> bool
> default n
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 16688f50729c..4910e6db2a34 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -328,68 +328,89 @@ static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
> static int really_probe(struct device *dev, struct device_driver *drv)
> {
> int ret = -EPROBE_DEFER;
> int local_trigger_count = atomic_read(&deferred_trigger_count);
> + bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE);
>
> if (defer_all_probes) {
> /*
> * Value of defer_all_probes can be set only by
> * device_defer_all_probes_enable() which, in turn, will call
> * wait_for_device_probe() right after that to avoid any races.
> */
> dev_dbg(dev, "Driver %s force probe deferral\n", drv->name);
> driver_deferred_probe_add(dev);
> return ret;
> }
>
> atomic_inc(&probe_count);
> pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
> drv->bus->name, __func__, drv->name, dev_name(dev));
> WARN_ON(!list_empty(&dev->devres_head));
>
> +re_probe:
> dev->driver = drv;
>
> /* If using pinctrl, bind pins now before probing */
> ret = pinctrl_bind_pins(dev);
> if (ret)
> goto pinctrl_bind_failed;
>
> if (driver_sysfs_add(dev)) {
> printk(KERN_ERR "%s: driver_sysfs_add(%s) failed\n",
> __func__, dev_name(dev));
> goto probe_failed;
> }
>
> if (dev->pm_domain && dev->pm_domain->activate) {
> ret = dev->pm_domain->activate(dev);
> if (ret)
> goto probe_failed;
> }
>
> /*
> * Ensure devices are listed in devices_kset in correct order
> * It's important to move Dev to the end of devices_kset before
> * calling .probe, because it could be recursive and parent Dev
> * should always go first
> */
> devices_kset_move_last(dev);
>
> if (dev->bus->probe) {
> ret = dev->bus->probe(dev);
> if (ret)
> goto probe_failed;
> } else if (drv->probe) {
> ret = drv->probe(dev);
> if (ret)
> goto probe_failed;
> }
>
> + if (test_remove) {
> + test_remove = false;
> +
> + if (dev->bus && dev->bus->remove)
> + dev->bus->remove(dev);
> + else if (drv->remove)
> + drv->remove(dev);
> +
> + devres_release_all(dev);
> + driver_sysfs_remove(dev);
> + dev->driver = NULL;
> + dev_set_drvdata(dev, NULL);
> + if (dev->pm_domain && dev->pm_domain->dismiss)
> + dev->pm_domain->dismiss(dev);
> + pm_runtime_reinit(dev);
> +
> + goto re_probe;
> + }
> +
> pinctrl_init_done(dev);
>
> if (dev->pm_domain && dev->pm_domain->sync)
> dev->pm_domain->sync(dev);
>
> driver_bound(dev);
> ret = 1;
> pr_debug("bus: '%s': %s: bound device %s to driver %s\n",
> drv->bus->name, __func__, dev_name(dev), drv->name);
> goto done;

If none of the remove() callbacks are non-NULL -- that is, when the
device cannot be un-bound --, then none of the remove() callbacks are
called. That's good.

However:
- other resources are nonetheless released (from under the live device),
and
- the original device -- which is still bound -- is re-probed. This
leads to crashes such as the following:

> OF: PCI: host bridge /pcie@10000000 ranges: <------------ first probing
> OF: PCI: IO 0x3eff0000..0x3effffff -> 0x00000000
> OF: PCI: MEM 0x10000000..0x3efeffff -> 0x10000000
> OF: PCI: MEM 0x8000000000..0xffffffffff -> 0x8000000000
> pci-host-generic 3f000000.pcie: ECAM at [mem 0x3f000000-0x3fffffff] for [bus 00-0f]
> pci-host-generic 3f000000.pcie: PCI host bridge to bus 0000:00
> pci_bus 0000:00: root bus resource [bus 00-0f]
> pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
> pci_bus 0000:00: root bus resource [mem 0x10000000-0x3efeffff]
> pci_bus 0000:00: root bus resource [mem 0x8000000000-0xffffffffff]
> pci 0000:00:02.0: BAR 0: assigned [mem 0x8000000000-0x8000003fff 64bit]
> pci 0000:00:03.0: BAR 4: assigned [mem 0x8000004000-0x8000007fff 64bit pref]
> pci 0000:00:01.0: BAR 1: assigned [mem 0x10000000-0x10000fff]
> pci 0000:00:03.0: BAR 1: assigned [mem 0x10001000-0x10001fff]
> pci 0000:00:01.0: BAR 0: assigned [io 0x1000-0x103f]
> pci 0000:00:02.0: enabling device (0000 -> 0002)
> OF: PCI: host bridge /pcie@10000000 ranges: <------------ second probing
> OF: PCI: IO 0x3eff0000..0x3effffff -> 0x00000000
> OF: PCI: MEM 0x10000000..0x3efeffff -> 0x10000000
> OF: PCI: MEM 0x8000000000..0xffffffffff -> 0x8000000000
> ------------[ cut here ]------------
> kernel BUG at lib/ioremap.c:64!
> Internal error: Oops - BUG: 0 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-0.rc0.git2.1.fc26.aarch64 #1
> Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
> task: fffffe00e402e400 task.stack: fffffe00e8028000
> PC is at ioremap_page_range+0x168/0x2f8
> LR is at pci_remap_iospace+0x68/0x88
> pc : [<fffffc0008487c48>] lr : [<fffffc00084fc6e0>] pstate: 80000145
> sp : fffffe00e802ba80
> x29: fffffe00e802ba80 x28: 00000200c01f0000
> x27: 0140000000000800 x26: fffffdff7ee10000
> x25: 0400000000000001 x24: 00e8000000000707
> x23: fffffc0009ff7fd8 x22: fffffdff7ee10000
> x21: fffffdff7ee00000 x20: fffffdff7ee0ffff
> x19: 000000003eff0000 x18: 0000000000000010
> x17: 0000000000000000 x16: 0000000000000000
> x15: 0000000000000006 x14: fffffc0089cc42f7
> x13: fffffc0009cc4305 x12: 00000000000000ed
> x11: 0000000000000006 x10: 00000000000000ee
> x9 : 0000000000000001 x8 : fffffe00e8028000
> x7 : fffffc0009002e68 x6 : 00000200c01f0000
> x5 : fffffe000104f700 x4 : 000000000000ffff
> x3 : fffffc0008ee5eb0 x2 : 0000000040000000
> x1 : 0000000041040000 x0 : 00e800003eff0707
>
> Process swapper/0 (pid: 1, stack limit = 0xfffffe00e8028020)
> Stack: (0xfffffe00e802ba80 to 0xfffffe00e802c000)
> ba80: fffffe00e802bb00 fffffc00084fc6e0 fffffe00f205c980 000000003eff0000
> baa0: 0000000000000000 fffffc0008f8d908 fffffe00f2050d00 fffffc0008c251c8
> bac0: fffffe00e802bb90 0000000000000000 fffffe00f205c980 0000000000000000
> bae0: fffffe00e802bb20 fffffc000851f60c fffffdff7e801384 fffffe00e0105810
> bb00: fffffe00e802bb20 fffffc000851f6e4 fffffe00f205c000 fffffe00e0105810
> bb20: fffffe00e802bbe0 fffffc000851f7e4 fffffe00e0105800 fffffe00e0105810
> bb40: fffffc0008f8dab8 fffffc0009f88000 0000000000000000 0000000000000000
> bb60: fffffc0008d063c0 fffffc0008d86398 fffffc0008e520e0 fffffc000894d8cc
> bb80: fffffe00e802bbb0 fffffc00087b5bc0 fffffe00f2056b80 fffffe00f2053d00
> bba0: 000000003eff0000 fffffe00ffffd068 fffffe00e802bbe0 fffffc000851f7d8
> bbc0: fffffe00e0105800 fffffe00e0105810 fffffc0008f8dab8 fffffc0009f88000
> bbe0: fffffe00e802bc00 fffffc000862a760 00000000fffffffe fffffe00e0105810
> bc00: fffffe00e802bc30 fffffc0008627ac8 fffffe00e0105810 0000000000000000
> bc20: fffffc0008f8dae0 0000000000000000 fffffe00e802bc70 fffffc0008627e54
> bc40: fffffe00e0105810 fffffe00e0105870 fffffc0008f8dae0 0000000000000000
> bc60: fffffc0008fc4000 fffffc0008d20464 fffffe00e802bca0 fffffc00086252a8
> bc80: 0000000000000000 fffffc0008f8dae0 fffffc0008627d28 fffffc0008d86398
> bca0: fffffe00e802bce0 fffffc0008627130 fffffc0008f8dae0 fffffe00f3038900
> bcc0: fffffc0008fc5768 0000000000000000 fffffe00fa703930 fffffe00e81ce198
> bce0: fffffe00e802bd00 fffffc0008626b50 fffffc0008f8dae0 fffffe00f3038900
> bd00: fffffe00e802bd40 fffffc0008628ee0 fffffc0008f8dae0 0000000000000000
> bd20: 0000000000000000 0000000000000006 fffffc0009060000 0000000000000000
> bd40: fffffe00e802bd60 fffffc000862a684 fffffc0008f8dab8 0000000000000000
> bd60: fffffe00e802bd80 fffffc0008d5c5ac fffffc0008d5c594 fffffe00e8028000
> bd80: fffffe00e802bd90 fffffc0008083594 fffffe00e802be00 fffffc0008d20dec
> bda0: 0000000000000102 fffffc0009060000 fffffc0008d863a8 0000000000000006
> bdc0: fffffc0008e51c00 0000000000000000 fffffc0008d20dec fffffc0008bbe670
> bde0: 0000000000000000 0000000600000006 fffffc0008d20464 fffffc0008d063c0
> be00: fffffe00e802bea0 fffffc0008945568 fffffc0008945550 0000000000000000
> be20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> be40: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
> be60: 0000000000000001 0000000000000000 0000000000000000 0000000000000000
> be80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bea0: 0000000000000000 fffffc0008083330 fffffc0008945550 0000000000000000
> bec0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bee0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bf00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bf20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bf40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bf60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bf80: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bfa0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> bfc0: 0000000000000000 0000000000000005 0000000000000000 0000000000000000
> bfe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Call trace:
> Exception stack(0xfffffe00e802b8a0 to 0xfffffe00e802b9d0)
> b8a0: 000000003eff0000 0000040000000000 fffffe00e802ba80 fffffc0008487c48
> b8c0: 0000000080000145 000000000000003d fffffe00e802b960 fffffc0008137f44
> b8e0: 8b6ac059b5e8abc6 fffffe00e402ecd8 0000000000000000 0000000000000000
> b900: fffffe00e402e400 0000000000000000 0000000000000001 fffffc0009cc4000
> b920: 0000000000000002 fffffe00e402ecb0 fffffe00e0105bd0 0000000000000000
> b940: fffffe00e402e400 0000000000000000 00e800003eff0707 0000000041040000
> b960: 0000000040000000 fffffc0008ee5eb0 000000000000ffff fffffe000104f700
> b980: 00000200c01f0000 fffffc0009002e68 fffffe00e8028000 0000000000000001
> b9a0: 00000000000000ee 0000000000000006 00000000000000ed fffffc0009cc4305
> b9c0: fffffc0089cc42f7 0000000000000006
> [<fffffc0008487c48>] ioremap_page_range+0x168/0x2f8
> [<fffffc00084fc6e0>] pci_remap_iospace+0x68/0x88
> [<fffffc000851f6e4>] pci_host_common_probe+0x24c/0x318
> [<fffffc000851f7e4>] gen_pci_probe+0x34/0x40
> [<fffffc000862a760>] platform_drv_probe+0x60/0xc8
> [<fffffc0008627ac8>] driver_probe_device+0x240/0x4a0
> [<fffffc0008627e54>] __driver_attach+0x12c/0x130
> [<fffffc00086252a8>] bus_for_each_dev+0x70/0xb0
> [<fffffc0008627130>] driver_attach+0x30/0x40
> [<fffffc0008626b50>] bus_add_driver+0x200/0x2b8
> [<fffffc0008628ee0>] driver_register+0x68/0x100
> [<fffffc000862a684>] __platform_driver_register+0x54/0x60
> [<fffffc0008d5c5ac>] gen_pci_driver_init+0x18/0x20
> [<fffffc0008083594>] do_one_initcall+0x44/0x138
> [<fffffc0008d20dec>] kernel_init_freeable+0x23c/0x2dc
> [<fffffc0008945568>] kernel_init+0x18/0x110
> [<fffffc0008083330>] ret_from_fork+0x10/0x20
> Code: 54fff801 52800000 1400005e d503201f (d4210000)
> ---[ end trace bf4514e6b7f2008d ]---
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>
> Kernel Offset: disabled
> Memory Limit: none
> ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

The "drivers/pci/host/pci-host-generic.c" driver has no remove method,
and its probe method works like this:

gen_pci_probe() [drivers/pci/host/pci-host-generic.c]
pci_host_common_probe() [drivers/pci/host/pci-host-common.c]
gen_pci_init() [drivers/pci/host/pci-host-common.c]
gen_pci_parse_request_of_pci_ranges() [drivers/pci/host/pci-host-common.c]
of_pci_get_host_bridge_resources() [drivers/of/of_pci.c]
pr_info(
"host bridge %s ranges:\n",
dev->full_name);
pci_remap_iospace() [drivers/pci/pci.c]

At the second probing attempt, the area is already mapped (the device is
still bound from the first probing), which triggers

BUG_ON(!pte_none(*pte));

in ioremap_pte_range() [lib/ioremap.c].

This is almost a regression because the kernel crashes with valid
drivers. It is not an error for a driver to not provide a remove()
callback, so in this instance CONFIG_DEBUG_TEST_DRIVER_REMOVE does not
expose a driver bug, it breaks with a valid driver. Not a regression for
the upstream kernel after all, because the Kconfig documentation
suggests N as default.

Proposed solution: if none of the remove() methods exist, or the
remove() method that exists fails, then don't release any resources, and
don't re-probe the device.

Thanks!