Re: [PATCH v9 10/13] PCI: Give pci_intx() its own devres callback

From: Kalra, Ashish
Date: Tue Jul 09 2024 - 04:13:05 EST


Hello Philipp,

On 7/9/2024 2:21 AM, Philipp Stanner wrote:
> @Bjorn, @Krzysztof
>
> On Mon, 2024-07-08 at 21:46 +0000, Ashish Kalra wrote:
>> With this patch applied, we are observing unloading and then
>> reloading issues with the AMD Crypto (CCP) driver:
> Thank you very much for digging into this, Ashish
>
> Could you give me some pointers how one could test CCP by himself?
>
>> with DEVRES logging enabled, we observe the following logs:
>>
>> [  218.093588] ccp 0000:a2:00.1: DEVRES REL 00000000c18c52fb
>> 0xffff8d09dc1972c0 devm_kzalloc_release (152 bytes)
>> [  218.105527] ccp 0000:a2:00.1: DEVRES REL 000000003091fb95
>> 0xffff8d09d3aad000 devm_kzalloc_release (3072 bytes)
>> [  218.117500] ccp 0000:a2:00.1: DEVRES REL 0000000049e4adfe
>> 0xffff8d09d588f000 pcim_intx_restore (4 bytes)
>> [  218.129519] ccp 0000:a2:00.1: DEVRES ADD 000000001a2ac6ad
>> 0xffff8cfa867b7cc0 pcim_intx_restore (4 bytes)
>> [  218.140434] ccp 0000:a2:00.1: DEVRES REL 00000000627ecaf7
>> 0xffff8d09d588f680 pcim_msi_release (16 bytes)
>> [  218.151665] ccp 0000:a2:00.1: DEVRES REL 0000000058b2252a
>> 0xffff8d09dc199680 msi_device_data_release (80 bytes)
>> [  218.163625] ccp 0000:a2:00.1: DEVRES REL 00000000435cc85e
>> 0xffff8d09d588ff80 devm_attr_group_remove (8 bytes)
>> [  218.175224] ccp 0000:a2:00.1: DEVRES REL 00000000cb6fcd9b
>> 0xffff8d09eb583660 pcim_addr_resource_release (40 bytes)
>> [  218.187319] ccp 0000:a2:00.1: DEVRES REL 00000000d64a8b84
>> 0xffff8d09eb583180 pcim_iomap_release (48 bytes)
>> [  218.198615] ccp 0000:a2:00.1: DEVRES REL 0000000099ac6b28
>> 0xffff8d09eb5830c0 pcim_addr_resource_release (40 bytes)
>> [  218.210730] ccp 0000:a2:00.1: DEVRES REL 00000000bdd27f88
>> 0xffff8d09d3ac2700 pcim_release (0 bytes)
>> [  218.221489] ccp 0000:a2:00.1: DEVRES REL 00000000e763315c
>> 0xffff8d09d3ac2240 devm_kzalloc_release (20 bytes)
>> [  218.233008] ccp 0000:a2:00.1: DEVRES REL 00000000ae90f983
>> 0xffff8d09dc25a800 devm_kzalloc_release (184 bytes)
>> [  218.245251] ccp 0000:23:00.1: DEVRES REL 00000000a2ec0085
>> 0xffff8cfa86bee700 fw_name_devm_release (16 bytes)
>> [  218.256748] ccp 0000:23:00.1: DEVRES REL 0000000021bccd98
>> 0xffff8cfaa528d5c0 devm_pages_release (16 bytes)
>> [  218.268044] ccp 0000:23:00.1: DEVRES REL 000000003ef7cbc7
>> 0xffff8cfaa1b5ec00 devm_kzalloc_release (104 bytes)
>> [  218.279631] ccp 0000:23:00.1: DEVRES REL 00000000619322e1
>> 0xffff8cfaa1b5e480 devm_kzalloc_release (152 bytes)
>> [  218.300438] ccp 0000:23:00.1: DEVRES REL 00000000c261523b
>> 0xffff8cfaad88b000 devm_kzalloc_release (3072 bytes)
>> [  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618
>> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
>> [  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767
>> 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
>> [  218.391226] ccp 0000:23:00.1: DEVRES REL 0000000058c9dce1
>> 0xffff8cfaa528d880 pcim_msi_release (16 bytes)
>> [  218.421340] ccp 0000:23:00.1: DEVRES REL 00000000c8ab08a7
>> 0xffff8cfa9e617300 msi_device_data_release (80 bytes)
>> [  218.452357] ccp 0000:23:00.1: DEVRES REL 00000000cf5baccb
>> 0xffff8cfaa528d8c0 devm_attr_group_remove (8 bytes)
>> [  218.483011] ccp 0000:23:00.1: DEVRES REL 00000000b8cbbadd
>> 0xffff8cfa9c596060 pcim_addr_resource_release (40 bytes)
>> [  218.514343] ccp 0000:23:00.1: DEVRES REL 00000000920f9607
>> 0xffff8cfa9c596c60 pcim_iomap_release (48 bytes)
>> [  218.544659] ccp 0000:23:00.1: DEVRES REL 00000000d401a708
>> 0xffff8cfa9c596840 pcim_addr_resource_release (40 bytes)
>> [  218.575774] ccp 0000:23:00.1: DEVRES REL 00000000865d2fa2
>> 0xffff8cfaa528d940 pcim_release (0 bytes)
>> [  218.605758] ccp 0000:23:00.1: DEVRES REL 00000000f5b79222
>> 0xffff8cfaa528d080 devm_kzalloc_release (20 bytes)
>> [  218.636260] ccp 0000:23:00.1: DEVRES REL 0000000037ef240a
>> 0xffff8cfa9eeb3f00 devm_kzalloc_release (184 bytes)
>>
>> and the CCP driver reload issue during driver probe:
>>
>> [  226.552684] pci 0000:23:00.1: Resources present before probing
>> [  226.568846] pci 0000:a2:00.1: Resources present before probing
>>
>> From the above DEVRES logging, it looks like pcim_intx_restore
>> associated resource is being released but then
>> being re-added during detach/unload, which causes really_probe() to
>> fail at probe time, as dev->devres_head is
>> not empty due to this added resource:
>> ...
>> [  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618
>> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
>> [  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767
>> 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
>> ...
>>
>> Going more deep into this:
>>
>> This is the initial pcim_intx_resoure associated resource being added
>> during first (CCP) driver load:
>>
>> [   40.418933]  pcim_intx+0x3a/0x120
>> [   40.418936]  pci_intx+0x8b/0xa0
>> [   40.418939]  __pci_enable_msix_range+0x369/0x530
>> [   40.418943]  pci_enable_msix_range+0x18/0x20
>> [   40.418946]  sp_pci_probe+0x106/0x310 [ccp]
>> [   40.418965] ipmi device interface
>> [   40.418960]  ? srso_alias_return_thunk+0x5/0xfbef5
>> [   40.418969]  local_pci_probe+0x4f/0xb0
>> [   40.418973]  work_for_cpu_fn+0x1e/0x30
>> [   40.418976]  process_one_work+0x183/0x350
>> [   40.418980]  worker_thread+0x2df/0x3f0
>> [   40.418982]  ? __pfx_worker_thread+0x10/0x10
>> [   40.418985]  kthread+0xd0/0x100
>> [   40.418987]  ? __pfx_kthread+0x10/0x10
>> [   40.418990]  ret_from_fork+0x40/0x60
>> [   40.418993]  ? __pfx_kthread+0x10/0x10
>> [   40.418996]  ret_from_fork_asm+0x1a/0x30
>> [   40.419001]  </TASK>
>> ..
>> ..
>> [   40.419012] ccp 0000:23:00.1: DEVRES ADD 00000000fbd19618
>> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
>>
>> Now, at driver unload:
>> devres_release_all() -> remove_nodes() -> release_nodes() ...
>>
>> remove_nodes() moves normal devres entries to the todo list, as can
>> be seen with the following log:
>> ...
>> [  218.245241] moving node 00000000fbd19618 0xffff8cfaa528d140 from
>> devres to todo list
>> ...
>>
>> So, now this pcim_intx_resource associated resource is no longer part
>> of dev->devres_head list and has been
>> moved to the todo list.
>>
>> Later, when release_nodes() is invoked, it calls the associated
>> release() callback associated with this devres:
>> ...
>> [  218.331000] ccp 0000:23:00.1: DEVRES REL 00000000fbd19618
>> 0xffff8cfaa528d140 pcim_intx_restore (4 bytes)
>> ...
>>
>> The call flow for that is:
>> pcim_intx_restore() -> pci_intx() -> pcim_intx() ...
>>
>> Now, pcim_intx() calls get_or_create_intx_devres() which tries to
>> find it's associated devres using devres_find(), but
>> that fails to find the devres, as the devres is no longer on dev-
>>> devres_head and has been moved to todo list.
>> Therefore, get_or_create_intx_devres() adds a new devres at driver
>> unload/detach time:
>> ...
>> [  218.361330] ccp 0000:23:00.1: DEVRES ADD 0000000057f8e767
>> 0xffff8cfa867b7740 pcim_intx_restore (4 bytes)
>> ...
> You're absolutely right, that seems to be the issue precisely. In fact,
> this problem of PCI hybrid functions calling themselves again even
> forced me to implement a "pure unmanaged" version of
> __pci_request_region(). So it's a pity that I didn't think of that
> problem for pci_intx().
>
>> But, then this is an issue as pcim_intx() is supposed to restore the
>> original PCI INTx state on driver detach, but it now
>> operating on a newly added devres and not the original devres (added
>> at driver probe) which contains the original PCI INTx
>> state, so it will be restoring an incorrect PCI INTx state ?
> I think this is just UB and we don't have to think about whether it's
> the correct state or not – it must only be restored once, so it's
> broken in any case.
>
>> Additionally, this newly added devres causes driver reload/probe
>> failure as really_probe() now finds resources present
>> before probing.
> Yes, that has to be separated.
>
> @Bjorn:
> So I think the solution will be not to call into pci_intx() from
> pcim_intx_restore() at all anymore.
>
> Similar as we do with __pci_request_region() <-> __pcim_request_region().
>
> Let me dig into that..
>
> I guess you'll prefer me to send a fixup commit to squash into the
> pcim_intx() commit?
>
> I'm quite busy today, but will definitely deliver that quite soon.
>
>> Not sure, if this issue has been observed with other PCI device
>> drivers.
> Everyone using pci_intx() AND pcim_enable_device() will have this
> issue.
>
> The only thing I'm wondering about is where your code in
> drivers/crypto/ccp/ calls into pci_intx()?
>
Actually, the CCP driver does not explicitly call into pci_intx(), the flow to pci_intx() from CCP driver probe is as follows:

[   40.418933]  pcim_intx+0x3a/0x120
[   40.418936]  pci_intx+0x8b/0xa0
[   40.418939]  __pci_enable_msix_range+0x369/0x530
[   40.418943]  pci_enable_msix_range+0x18/0x20
[   40.418946]  sp_pci_probe+0x106/0x310 [ccp]
[   40.418965] ipmi device interface
[   40.418960]  ? srso_alias_return_thunk+0x5/0xfbef5
[   40.418969]  local_pci_probe+0x4f/0xb0

And obviously, pci_intx()->pcim_intx() get invoked due to pcim_intx_restore() callback being invoked during sp_pci_exit() code path, as below:

[  218.128978]  pcim_intx+0x3a/0x120
[  218.128986]  ? srso_alias_return_thunk+0x5/0xfbef5
[  218.128999]  pci_intx+0x8b/0xa0
[  218.129010]  pcim_intx_restore+0x1b/0x30
[  218.129019]  release_nodes+0x65/0x90
[  218.129031]  devres_release_all+0x9f/0xe0
[  218.129043]  device_unbind_cleanup+0x12/0x80
[  218.129055]  device_release_driver_internal+0x20c/0x250
[  218.129065]  ? srso_alias_return_thunk+0x5/0xfbef5
[  218.129078]  driver_detach+0x4f/0xa0
[  218.129091]  bus_remove_driver+0x87/0x100
[  218.129101]  driver_unregister+0x35/0x60
[  218.129113]  pci_unregister_driver+0x44/0xa0
[  218.129123]  sp_pci_exit+0x19/0x20 [ccp]
[  218.129137]  sp_mod_exit+0x12/0x18 [ccp]

...

Basically, CCP driver calls pci_enable_msix_range() and looking at this function:

pci_enable_msix_range() -> __pci_enable_msix_range() -> msix_capability_init().

And, msix_capability_init() -> pci_intx_for_msi() -> pci_intx().

So it looks like drivers using MSI-X/MSI (using API such as pci_enable_msix_range()) and pcim_enable_device() will get into this issue without explicitly calling into pci_intx().

Thanks, Ashish

> Regards,
> P.
>
>> Thanks,
>> Ashish
>>