Re: [PATCH] PCI: take the rescan lock when adding devices during host probe
From: Konrad Dybcio
Date: Wed Oct 02 2024 - 16:31:58 EST
On 2.10.2024 10:36 AM, Bartosz Golaszewski wrote:
> On Tue, Oct 1, 2024 at 11:11 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>>
>> On Thu, Sep 26, 2024 at 03:09:23PM +0200, Bartosz Golaszewski wrote:
>>> From: Bartosz Golaszewski <bartosz.golaszewski@xxxxxxxxxx>
>>>
>>> Since adding the PCI power control code, we may end up with a race
>>> between the pwrctl platform device rescanning the bus and the host
>>> controller probe function. The latter needs to take the rescan lock when
>>> adding devices or may crash.
>>>
>>> Reported-by: Konrad Dybcio <konradybcio@xxxxxxxxxx>
>>> Fixes: 4565d2652a37 ("PCI/pwrctl: Add PCI power control core code")
>>> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@xxxxxxxxxx>
>>> ---
>>> drivers/pci/probe.c | 2 ++
>>> 1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>> index 4f68414c3086..f1615805f5b0 100644
>>> --- a/drivers/pci/probe.c
>>> +++ b/drivers/pci/probe.c
>>> @@ -3105,7 +3105,9 @@ int pci_host_probe(struct pci_host_bridge *bridge)
>>> list_for_each_entry(child, &bus->children, node)
>>> pcie_bus_configure_settings(child);
>>>
>>> + pci_lock_rescan_remove();
>>> pci_bus_add_devices(bus);
>>> + pci_unlock_rescan_remove();
>>
>> Seems like we do need locking here, but don't we need a more
>> comprehensive change? There are many other callers of
>> pci_bus_add_devices(), and most of them look similarly unprotected.
>>
>
> From a quick glance it looks like the majority of users are specific
> drivers (controller, hotplug, etc.). The calls inside pci_rescan_bus()
> and pci_rescan_bus_bridge_resize() are already protected from what I
> can tell. I'm not saying that the driver calls shouldn't be fixed but
> there's no immediate danger. This however fixes an issue we hit with
> PCI core so I'd send it upstream now and then we can think about the
> other use-cases.
Probably worth showing an example of how this can manifest:
removed a device through sysfs and called bus rescan:
[ 63.851901] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 63.861048] Mem abort info:
[ 63.863940] ESR = 0x0000000096000004
[ 63.867822] EC = 0x25: DABT (current EL), IL = 32 bits
[ 63.873291] SET = 0, FnV = 0
[ 63.876440] EA = 0, S1PTW = 0
[ 63.879688] FSC = 0x04: level 0 translation fault
[ 63.884711] Data abort info:
[ 63.887697] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 63.893344] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 63.898549] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 63.904009] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000116c36000
[ 63.910644] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[ 63.917683] Internal error: Oops: 0000000096000004 [#1] SMP
[ 63.923413] Modules linked in:
[ 63.926561] CPU: 1 UID: 0 PID: 530 Comm: bash Tainted: G W <snip> #10176
[ 63.938971] Tainted: [W]=WARN
[ 63.942037] Hardware name: <snip>
[ 63.951239] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 63.958398] pc : __pi_strlen+0x14/0x150
[ 63.962350] lr : kernfs_name_hash+0x24/0x88
[ 63.966668] sp : ffff800083c43af0
[ 63.970089] x29: ffff800083c43af0 x28: ffff519888b83500 x27: 0000000000000000
[ 63.977420] x26: 0000000000000000 x25: 0000000000000000 x24: ffff519888b83500
[ 63.984751] x23: 0000000000000011 x22: ffff519886bd6040 x21: ffff519886bd5b00
[ 63.992081] x20: 0000000000000000 x19: 0000000000000000 x18: ffff80008a0bd0a8
[ 63.999410] x17: 0000000000000001 x16: ffff519888b83de8 x15: ffffa08dea3f5bf0
[ 64.006741] x14: ffff519888b83de8 x13: ffffffffffffffff x12: 0000000000000028
[ 64.014076] x11: ffffa08dea3f5bf0 x10: 0000000000000012 x9 : 0000000000000001
[ 64.021403] x8 : 0101010101010101 x7 : ffffa08de7a5e844 x6 : 0000000000000000
[ 64.028730] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 64.036062] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[ 64.043390] Call trace:
[ 64.045918] __pi_strlen+0x14/0x150
[ 64.049508] kernfs_find_ns+0x80/0x13c
[ 64.053375] kernfs_remove_by_name_ns+0x54/0xf0
[ 64.058036] sysfs_remove_bin_file+0x24/0x34
[ 64.062436] pci_remove_resource_files+0x3c/0x84
[ 64.067190] pci_remove_sysfs_dev_files+0x28/0x38
[ 64.072025] pci_stop_bus_device+0x8c/0xd8
[ 64.076241] pci_stop_bus_device+0x40/0xd8
[ 64.080463] pci_stop_and_remove_bus_device_locked+0x28/0x48
[ 64.086277] remove_store+0x70/0xb0
[ 64.089878] dev_attr_store+0x20/0x38
[ 64.093649] sysfs_kf_write+0x58/0x78
[ 64.097426] kernfs_fop_write_iter+0xe8/0x184
[ 64.101905] vfs_write+0x2dc/0x308
[ 64.105413] ksys_write+0x7c/0xec
[ 64.108834] __arm64_sys_write+0x24/0x34
[ 64.112880] invoke_syscall+0x48/0x100
[ 64.116744] el0_svc_common+0xb4/0xe8
[ 64.120522] do_el0_svc+0x24/0x34
[ 64.123938] el0_svc+0x58/0xb4
[ 64.127085] el0t_64_sync_handler+0x50/0x12c
[ 64.131474] el0t_64_sync+0x198/0x19c
[ 64.135244] Code: 92402c04 b200c3e8 f13fc09f 5400088c (a9400c02)
[ 64.141508] ---[ end trace 0000000000000000 ]---
Konrad