Re: [PATCH v2 0/2] Attach DT nodes to existing PCI devices

From: Bjorn Helgaas
Date: Tue Mar 19 2024 - 11:25:22 EST


[+cc Krzysztof]

On Fri, Dec 15, 2023 at 02:52:07PM +0100, Herve Codina wrote:
> On Mon, 4 Dec 2023 07:59:09 -0600
> Rob Herring <robh@xxxxxxxxxx> wrote:
> > On Mon, Dec 4, 2023 at 6:43 AM Herve Codina <herve.codina@xxxxxxxxxxx> wrote:
> > > On Fri, 1 Dec 2023 16:26:45 -0600
> > > Rob Herring <robh@xxxxxxxxxx> wrote:
> > > > On Thu, Nov 30, 2023 at 10:57 AM Herve Codina <herve.codina@xxxxxxxxxxx> wrote:
> > > > ...

> > > > --- a/drivers/pci/of.c
> > > > +++ b/drivers/pci/of.c
> > > > @@ -31,6 +31,8 @@ int pci_set_of_node(struct pci_dev *dev)
> > > > return 0;
> > > >
> > > > node = of_pci_find_child_device(dev->bus->dev.of_node, dev->devfn);
> > > > + if (!node && pci_is_bridge(dev))
> > > > + of_pci_make_dev_node(dev);
> > > > if (!node)
> > > > return 0;
> > >
> > > Maybe it is too early.
> > > of_pci_make_dev_node() creates a node and fills some properties based on
> > > some already set values available in the PCI device such as its struct resource
> > > values.
> > > We need to have some values set by the PCI infra in order to create our DT node
> > > with correct values.
> >
> > Indeed, that's probably the issue I'm having. In that case,
> > DECLARE_PCI_FIXUP_HEADER should work. That's later, but still before
> > device_add().
> >
> > I think modifying sysfs after device_add() is going to race with
> > userspace. Userspace is notified of a new device, and then the of_node
> > link may or may not be there when it reads sysfs. Also, not sure if
> > we'll need DT modaliases with PCI devices, but they won't work if the
> > DT node is not set before device_add().
>
> DECLARE_PCI_FIXUP_HEADER is too early as well as doing the DT node creation
> just before the device_add() call.
> Indeed, in order to fill the DT properties, resources need to be assigned
> (needed for the 'ranges' property used for addresses translation).
> The resources assignment is done after the call to device_add().

Do we need to know the actual address *value* before creating the
sysfs file, or is it enough to know that the file should *exist*, even
if the value may be changed later?

> Some PCI sysfs files are already created after adding the device by the
> pci_create_sysfs_dev_files() call:
> https://elixir.bootlin.com/linux/v6.6/source/drivers/pci/bus.c#L347
>
> Is it really an issue to add the of_node link to sysfs on an already
> present device ?

Yes, I think this would be an issue. We've been trying to get rid of
pci_create_sysfs_dev_files() altogether because there's a long history
of race issues related to it:

https://lore.kernel.org/r/1271099285.9831.13.camel@localhost/ WARNING: at fs/sysfs/dir.c:451 sysfs_add_one: sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:01.0/slot'
https://lore.kernel.org/r/19461.26166.427857.612983@xxxxxxxxxxxxxxxxxxx/ [2.6.35-rc1 regression] sysfs: cannot create duplicate filename ... XVR-600 related?
https://lore.kernel.org/r/20200716110423.xtfyb3n6tn5ixedh@pali/ PCI: Race condition in pci_create_sysfs_dev_files
https://lore.kernel.org/r/m3eebg9puj.fsf@xxxxxxxxxxx/ PCI: Race condition in pci_create_sysfs_dev_files (can't boot)
https://bugzilla.kernel.org/show_bug.cgi?id=215515 sysfs: cannot create duplicate filename '.../0000:e0'

And several previous attempts to fix them:

https://lore.kernel.org/r/4469eba2-188b-aab7-07d1-5c77313fc42f@xxxxxxxxx/ Guard pci_create_sysfs_dev_files with atomic value
https://lore.kernel.org/r/20230316103036.1837869-1-alexander.stein@xxxxxxxxxxxxxxx PCI/sysfs: get rid of pci_sysfs_init late_initcall
https://lore.kernel.org/r/1702093576-30405-1-git-send-email-ssengar@xxxxxxxxxxxxxxxxxxx/ PCI/sysfs: Fix race in pci sysfs creation

Bjorn