PM domain change on unbound devices warning on ipmi_si unload

From: Joe Lawrence
Date: Wed Jan 27 2016 - 23:46:01 EST


Starting in 4.5-rc1, I noticed this warning on ipmi_si driver removal:

% modprobe ipmi_si
% rmmod ipmi_si

bus: 'platform': driver_probe_device: matched device IPI0001:00 with driver ipmi_si
bus: 'platform': really_probe: probing driver ipmi_si with device IPI0001:00
ipmi_si IPI0001:00: ipmi_si: probing via ACPI
ipmi_si IPI0001:00: [io 0x0ca2-0x0ca3] regsize 1 spacing 1 irq 0
ipmi_si: Adding ACPI-specified kcs state machine
driver: 'ipmi_si': driver_bound: bound to device 'IPI0001:00'
bus: 'platform': really_probe: bound device IPI0001:00 to driver ipmi_si
IPMI System Interface driver.
ipmi_si: probing via SMBIOS
ipmi_si: SMBIOS: io 0xda2 regsize 1 spacing 1 irq 0
ipmi_si: Adding SMBIOS-specified kcs state machine
ipmi_si: Trying ACPI-specified kcs state machine at i/o address 0xca2, slave address 0x0, irq 0
Registering platform device 'ipmi_bmc.0674.66'. Parent at platform
driver: 'ipmi': driver_bound: bound to device 'ipmi_bmc.0674.66'
ipmi_si IPI0001:00: Found new BMC (man_id: 0x000077, prod_id: 0x0674, dev_id: 0x42)
ipmi_si IPI0001:00: IPMI kcs interface initialized
------------[ cut here ]------------
WARNING: CPU: 39 PID: 3678 at drivers/base/power/common.c:150 dev_pm_domain_set+0x52/0x60()
PM domains can only be changed for unbound devices
[ ... snip ... ]
CPU: 39 PID: 3678 Comm: rmmod Tainted: G OE 4.5.0-rc1+ #57
Hardware name: Stratus ftServer 6800/G7LYY, BIOS BIOS Version 8.1:61 09/10/2015
0000000000000000 000000003883b68d ffff8820334d7d30 ffffffff8132caf0
ffff8820334d7d78 ffff8820334d7d68 ffffffff8107f1b6 ffff8810351eca00
0000000000000000 0000000000000001 0000000001c2a330 0000000001c29010
Call Trace:
[<ffffffff8132caf0>] dump_stack+0x44/0x64
[<ffffffff8107f1b6>] warn_slowpath_common+0x86/0xc0
[<ffffffff8107f24c>] warn_slowpath_fmt+0x5c/0x80
[<ffffffff8145e152>] dev_pm_domain_set+0x52/0x60
[<ffffffff813a38ae>] acpi_dev_pm_detach+0x3f/0x84
[<ffffffff8145e0d7>] dev_pm_domain_detach+0x27/0x30
[<ffffffff814575f8>] platform_drv_remove+0x38/0x40
[<ffffffff814557ba>] __device_release_driver+0x9a/0x140
[<ffffffff81455968>] driver_detach+0xb8/0xc0
[<ffffffff814547b5>] bus_remove_driver+0x55/0xd0
[<ffffffff814560cc>] driver_unregister+0x2c/0x50
[<ffffffff814576b2>] platform_driver_unregister+0x12/0x20
[<ffffffffa02586c9>] cleanup_ipmi_si+0x29/0xa0 [ipmi_si]
[<ffffffff81102100>] SyS_delete_module+0x190/0x220
[<ffffffff8167ffee>] entry_SYSCALL_64_fastpath+0x12/0x71
---[ end trace 671ca97b9ac15462 ]---

My platform has two BMCs (perhaps this is messing with a refcount
somewhere), but I wonder about the ordering of this code:

__device_release_driver(struct device *dev)

drv->remove(dev);
[ platform_drv_remove ]
...
dev_pm_domain_detach
device_is_bound
return dev->p && klist_node_attached(&dev->p->knode_driver)
...
klist_remove(&dev->p->knode_driver);

Is the klist_remove at the bottom of __device_release_driver necessary
to satisfy the earlier check in dev_pm_domain_detach's device_is_bound
assertion? If so, could these be out of order?

This is core driver code, so I'm assuming it's not something as simple
as the following (which avoided the warning on unload at least). Any
suggestions or extra debugging ideas welcome! This occurs on every
unload, so I'd be glad to test real solutions :)

Thanks,

-- Joe

-->8--

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index c4da2df..bba54e1 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -756,6 +756,7 @@ static void __device_release_driver(struct device *dev)

pm_runtime_put_sync(dev);

+ klist_remove(&dev->p->knode_driver);
if (dev->bus && dev->bus->remove)
dev->bus->remove(dev);
else if (drv->remove)
@@ -767,7 +768,6 @@ static void __device_release_driver(struct device *dev)
dev->pm_domain->dismiss(dev);
pm_runtime_reinit(dev);

- klist_remove(&dev->p->knode_driver);
device_pm_check_callbacks(dev);
if (dev->bus)
blocking_notifier_call_chain(&dev->bus->p->bus_notifier,