RE: [PATCH] of: dynamic: notify before revert

From: Peng Fan
Date: Thu Feb 29 2024 - 22:04:57 EST


> Subject: Re: [PATCH] of: dynamic: notify before revert
>
> On Thu, Feb 29, 2024 at 2:01 AM Peng Fan <peng.fan@xxxxxxx> wrote:
> >
> > > Subject: Re: [PATCH] of: dynamic: notify before revert
> > >
> > > On Wed, Feb 28, 2024 at 12:13 AM Peng Fan (OSS)
> > > <peng.fan@xxxxxxxxxxx>
> > > wrote:
> > > >
> > > > From: Peng Fan <peng.fan@xxxxxxx>
> > > >
> > > > When PCI node was created using an overlay and the overlay is
> > > > reverted/destroyed, the "linux,pci-domain" property no longer
> > > > exists, so of_get_pci_domain_nr will return failure. Then
> > > > of_pci_bus_release_domain_nr will actually use the dynamic IDA,
> > > > even if the IDA was allocated in static IDA.
> > >
> > > That usage is broken to begin with unless there is a guarantee that
> > > static and dynamic domain numbers don't overlap. For example, a
> > > dynamic number is assigned and then you load an overlay with the same
> number defined in it.
> >
> > I may not describe it clear, the code is here, because overlay
> > property removed, of_get_pci_domain_nr will return failure, so the
> > code path will goest into free a dynamic ida. But actually there is no
> > such dynamic ida, so dump.
>
> I understood the problem.
>
> Your usage of this is broken on applying your overlay. You just got lucky.

Would you please point me out where is broken on using overlay?

https://github.com/siemens/jailhouse/blob/master/driver/pci.c#L458


>
> > So the problem is overlay was removed, but the notify callback may
> > still use the property to do some work.
> >
> > static void of_pci_bus_release_domain_nr(struct pci_bus *bus, struct
> > device *parent) {
> > if (bus->domain_nr < 0)
> > return;
> >
> > /* Release domain from IDA where it was allocated. */
> > if (of_get_pci_domain_nr(parent->of_node) == bus->domain_nr)
> > ida_free(&pci_domain_nr_static_ida, bus->domain_nr);
> > else
> > ida_free(&pci_domain_nr_dynamic_ida, bus->domain_nr);
> > }
> > >
> > > > So move the notify before revert to fix the issue.
> > >
> > > You can't just change the timing. Something might require notify to
> > > be after the revert.
>
> Again ^^^
>
> > >
> > > > Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
> > >
> > > I don't see where the notifier is even used.
> >
> > The stack is as below:
> >
> > [ 595.150529] CPU: 1 PID: 582 Comm: jailhouse Tainted: G O
> 6.5.0-rc4-next-20230804-05021-g3d4cc14b42ef-dirty #355
> > [ 595.161998] Hardware name: NXP i.MX93 11X11 EVK board (DT) [
> > 595.167475] Call trace:
> > [ 595.169908] dump_backtrace+0x94/0xec [ 595.173573]
> > show_stack+0x18/0x24 [ 595.176884] dump_stack_lvl+0x48/0x60 [
> > 595.180541] dump_stack+0x18/0x24 [ 595.183843]
> > pci_bus_release_domain_nr+0x34/0x84
> > [ 595.188453] pci_remove_root_bus+0xa0/0xa4 [ 595.192544]
> > pci_host_common_remove+0x28/0x40 [ 595.196895]
> > platform_remove+0x54/0x6c [ 595.200641] device_remove+0x4c/0x80 [
> > 595.204209] device_release_driver_internal+0x1d4/0x230
> > [ 595.209430] device_release_driver+0x18/0x24 [ 595.213691]
> > bus_remove_device+0xcc/0x10c [ 595.217686] device_del+0x15c/0x41c
> > [ 595.221170] platform_device_del.part.0+0x1c/0x88
> > [ 595.225861] platform_device_unregister+0x24/0x40
> > [ 595.230557] of_platform_device_destroy+0xfc/0x10c
> > [ 595.235344] of_platform_notify+0x13c/0x178 [ 595.239518]
> > blocking_notifier_call_chain+0x6c/0xa0
> > [ 595.244389] __of_changeset_entry_notify+0x148/0x16c
> > [ 595.249346] of_changeset_revert+0xa8/0xcc [ 595.253437]
> > jailhouse_pci_virtual_root_devices_remove+0x50/0x74 [jailhouse]
>
> $ git grep jailhouse_pci_virtual_root_devices_remove
> (END)
>
> Another out of tree overlay user. I have little interest in worrying about them.
> No one wants to step up and solve the problems with overlays, so I'm not
> going to worry about them either.

Ok, but I think this is indeed an issue, if driver accessing property after
property removed with overlay revert.


Thanks,
Peng.

>
> Rob