Re: [PATCH] driver core: flush async calls before testing driver removal

From: Vladimir Zapolskiy
Date: Sat Dec 10 2016 - 20:44:59 EST


Hello Greg,

I'm adding Tejun to the list of addressees.

On 12/10/2016 03:04 PM, Greg Kroah-Hartman wrote:
> On Sat, Dec 10, 2016 at 02:38:41PM +0200, Vladimir Zapolskiy wrote:
>> Hello Greg,
>>
>> On 12/10/2016 09:32 AM, Greg Kroah-Hartman wrote:
>>> On Sat, Dec 10, 2016 at 02:15:19AM +0200, Vladimir Zapolskiy wrote:
>>>> If CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled a number of false
>>>> positives are reported for ATA controller drivers, because ATA port
>>>> probes are done asynchronously, and the same problem may also touch
>>>> other asynchronously probed drivers.
>>>>
>>>> To reduce the rate of false reports on boot call async_synchronize_full()
>>>> before attempting to remove a driver, the same is done in delete_module()
>>>> syscall for all possible drivers and in __device_release_driver() function
>>>> for asynchronously probed drivers.
>>>
>>> __device_release_driver() already calls this function, why call it
>>> again?
>>>
>>
>> __device_release_driver() is not called on test removal of drivers, if
>> CONFIG_DEBUG_TEST_DRIVER_REMOVE option is enabled.
>>
>> This opens a possibility to races like one I've discovered:
>>
>> https://www.spinics.net/lists/linux-ide/msg53473.html
>>
>> Next, async_synchronize_full() from __device_release_driver() is not
>> called in case of removal of ATA controller drivers, because
>> driver_allows_async_probing(drv) return value is false.
>
> Hm, how does this not also get hit if you unbind/bind/unbind/bind/etc.
> from userspace as well? I don't think this is a
> CONFIG_DEBUG_TEST_DRIVER_REMOVE issue, but just that this option finds
> the problem corner cases as you are finding out :)
>

and you are right, I managed to reproduce exactly the same race as before
running the unmodified kernel built from Torvald's branch head:

root@imx6q-sabresd:~# zcat /proc/config.gz | grep TEST_DRIVER_REMOVE
# CONFIG_DEBUG_TEST_DRIVER_REMOVE is not set

root@imx6q-sabresd:~# uname -a
Linux imx6q-sabresd 4.9.0-rc8-00134-g0451698 #23 SMP Sun Dec 11 03:37:30 EET 2016 armv7l GNU/Linux

root@imx6q-sabresd:~# echo 2200000.sata > /sys/bus/platform/drivers/ahci-imx/unbind ; echo 2200000.sata > /sys/bus/platform/drivers/ahci-imx/bind; echo 2200000.sata > /sys/bus/platform/drivers/ahci-imx/unbind
ahci-imx 2200000.sata: fsl,transmit-level-mV not specified, using 00000024
ahci-imx 2200000.sata: fsl,transmit-boost-mdB not specified, using 00000480
ahci-imx 2200000.sata: fsl,transmit-atten-16ths not specified, using 00002000
ahci-imx 2200000.sata: fsl,receive-eq-mdB not specified, using 05000000
ahci-imx 2200000.sata: SSS flag set, parallel bus scan disabled
ahci-imx 2200000.sata: AHCI 0001.0300 32 slots 1 ports 3 Gbps 0x1 impl platform mode
ahci-imx 2200000.sata: flags: ncq sntf stag pm led clo only pmp pio slum part ccc apst
scsi host0: ahci-imx
ata3: SATA max UDMA/133 mmio [mem 0x02200000-0x02203fff] port 0x100 irq 72
ata3: SATA link down (SStatus 0 SControl 300)
ahci-imx 2200000.sata: no device found, disabling link.
ahci-imx 2200000.sata: pass .hotplug=1 to enable hotplug
------------[ cut here ]------------
WARNING: CPU: 2 PID: 375 at drivers/ata/libata-core.c:6482 ata_port_detach+0x130/0x140
Modules linked in: ahci_imx dw_hdmi_ahb_audio evbug
CPU: 2 PID: 375 Comm: sh Not tainted 4.9.0-rc8-00134-g0451698 #23
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<>] (dump_backtrace) from [<>] (show_stack+0x20/0x24)
[<>] (show_stack) from [<>] (dump_stack+0xb4/0xe8)
[<>] (dump_stack) from [<>] (__warn+0xe4/0x110)
[<>] (__warn) from [<>] (warn_slowpath_null+0x30/0x38)
[<>] (warn_slowpath_null) from [<>] (ata_port_detach+0x130/0x140)
[<>] (ata_port_detach) from [<>] (ata_platform_remove_one+0x34/0x4c)
[<>] (ata_platform_remove_one) from [<>] (platform_drv_remove+0x34/0x4c)
[<>] (platform_drv_remove) from [<>] (__device_release_driver+0x98/0x134)
[<>] (__device_release_driver) from [<>] (device_release_driver+0x30/0x3c)
[<>] (device_release_driver) from [<>] (unbind_store+0x88/0x108)
[<>] (unbind_store) from [<>] (drv_attr_store+0x30/0x3c)
[<>] (drv_attr_store) from [<>] (sysfs_kf_write+0x58/0x5c)
[<>] (sysfs_kf_write) from [<>] (kernfs_fop_write+0x108/0x21c)
[<>] (kernfs_fop_write) from [<>] (__vfs_write+0x3c/0x128)
[<>] (__vfs_write) from [<>] (vfs_write+0xb0/0x178)
[<>] (vfs_write) from [<>] (SyS_write+0x4c/0xa0)
[<>] (SyS_write) from [<>] (ret_fast_syscall+0x0/0x1c)
---[ end trace 457071853738c95e ]---

Tejun, will you look at the problem again?

--
With best wishes,
Vladimir