RE: [PATCH] PCI: Add information about describing PCI in ACPI
From: Gabriele Paoloni
Date: Mon Nov 21 2016 - 12:23:29 EST
Hi Bjorn
> -----Original Message-----
> From: linux-pci-owner@xxxxxxxxxxxxxxx [mailto:linux-pci-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Bjorn Helgaas
> Sent: 21 November 2016 16:47
> To: Gabriele Paoloni
> Cc: Bjorn Helgaas; linux-pci@xxxxxxxxxxxxxxx; linux-
> acpi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-arm-
> kernel@xxxxxxxxxxxxxxxxxxx; linaro-acpi@xxxxxxxxxxxxxxxx
> Subject: Re: [PATCH] PCI: Add information about describing PCI in ACPI
>
> On Mon, Nov 21, 2016 at 08:52:52AM +0000, Gabriele Paoloni wrote:
> > Hi Bjorn
> >
> > > -----Original Message-----
> > > From: Bjorn Helgaas [mailto:helgaas@xxxxxxxxxx]
> > > Sent: 18 November 2016 17:54
> > > To: Gabriele Paoloni
> > > Cc: Bjorn Helgaas; linux-pci@xxxxxxxxxxxxxxx; linux-
> > > acpi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-arm-
> > > kernel@xxxxxxxxxxxxxxxxxxx; linaro-acpi@xxxxxxxxxxxxxxxx
> > > Subject: Re: [PATCH] PCI: Add information about describing PCI in
> ACPI
> > >
> > > On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni wrote:
> > > > > -----Original Message-----
> > > > > From: linux-kernel-owner@xxxxxxxxxxxxxxx [mailto:linux-kernel-
> > > > > owner@xxxxxxxxxxxxxxx] On Behalf Of Bjorn Helgaas
> > > > > Sent: 17 November 2016 18:00
> > >
> > > > > +Static tables like MCFG, HPET, ECDT, etc., are *not*
> mechanisms
> > > for
> > > > > +reserving address space! The static tables are for things the
> OS
> > > > > +needs to know early in boot, before it can parse the ACPI
> > > namespace.
> > > > > +If a new table is defined, an old OS needs to operate
> correctly
> > > even
> > > > > +though it ignores the table. _CRS allows that because it is
> > > generic
> > > > > +and understood by the old OS; a static table does not.
> > > >
> > > > Right so if my understanding is correct you are saying that
> resources
> > > > described in the MCFG table should also be declared in PNP0C02
> > > devices
> > > > so that the PNP driver can reserve these resources.
> > >
> > > Yes.
> > >
> > > > On the other side the PCI Root bridge driver should not reserve
> such
> > > > resources.
> > > >
> > > > Well if my understanding is correct I think we have a problem
> here:
> > > > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
> > > >
> > > > As you can see pci_ecam_create() will conflict with the pnp
> driver
> > > > as it will try to reserve the resources from the MCFG table...
> > > >
> > > > Maybe we need to rework pci_ecam_create() ?
> > >
> > > I think it's OK as it is.
> > >
> > > The pnp/system.c driver does try to reserve PNP0C02 resources, and
> it
> > > marks them as "not busy". That way they appear in /proc/iomem and
> > > won't be allocated for anything else, but they can still be
> requested
> > > by drivers, e.g., pci/ecam.c, which will mark them "busy".
> > >
> > > This is analogous to what the PCI core does in
> pci_claim_resource().
> > > This is really a function of the ACPI/PNP *core*, which should
> reserve
> > > all _CRS resources for all devices (not just PNP0C02 devices). But
> > > it's done by pnp/system.c, and only for PNP0C02, because there's a
> > > bunch of historical baggage there.
> > >
> > > You'll also notice that in this case, things are out of order:
> > > logically the pnp/system.c reservation should happen first, but in
> > > fact the pci/ecam.c request happens *before* the pnp/system.c one.
> > > That means the pnp/system.c one might fail and complain "[mem ...]
> > > could not be reserved".
> >
> > Correct me if I am wrong...
> >
> > So currently we are relying on the fact that pci_ecam_create() is
> called
> > before the pnp driver.
> > If the pnp driver came first we would end up in pci_ecam_create()
> failing
> > here:
> > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L76
> >
> > I am not sure but it seems to me like a bit weak condition to rely
> on...
> > what about removing the error condition in pci_ecam_create() and
> logging
> > just a dev_info()?
>
> Huh. I'm confused. I *thought* it would be safe to reverse the
> order, which would effectively be this:
>
> system_pnp_probe
> reserve_resources_of_dev
> reserve_range
> request_mem_region([mem 0xb0000000-0xb1ffffff])
> ...
> pci_ecam_create
> request_resource_conflict([mem 0xb0000000-0xb1ffffff])
>
>
> but I experimented with the patch below on qemu, and it failed as you
> predicted:
>
> ** res test **
> requested [mem 0xa0000000-0xafffffff]
> can't claim ECAM area [mem 0xa0000000-0xafffffff]: conflict with ECAM
> PNP [mem 0xa0000000-0xafffffff]
>
> I expected the request_resource_conflict() to succeed since it's
> completely contained in the "ECAM PNP" region. But I guess I don't
> understand kernel/resource.c well enough.
I think it fails because effectively the PNP driver is populating the
iomem_resource resource tree and therefore pci_ecam_create() finds that
it cannot add the cfg resource to the same hierarchy as it is already
there...
>
> I'm not sure we need to fix anything yet, since we currently do the
> ecam.c request before the system.c one, and any change there would be
> a long ways off. If/when that *does* change, I think the correct fix
> would be to change ecam.c so its request succeeds (by changing the way
> it does the request, fixing kernel/resource.c, or whatever) rather
> than to reduce the log level and ignore the failure.
Well in my mind I didn't want just to make the error disappear...
If all the resources should be reserved by the PNP driver then ideally
we could take away request_resource_conflict() from pci_ecam_create(),
but this would make buggy some systems with an already shipped BIOS
that relied on pci_ecam_create() reservation rather than PNP reservation.
Just removing the error condition and converting dev_err() into
dev_info() seems to me like accommodating already shipped BIOS images
and flagging a reservation that is already done by somebody else
without compromising the functionality of the PCI Root bridge driver
(so far the only reason why I can see the error condition there is
to catch a buggy MCFG with overlapping addresses; so if this is the
case maybe we need to have a different diagnostic check to make sure
that the MCFG table is alright)
BTW if you think that so far we can keep this as it is I am ok.
Many Thanks
Gab
>
> Bjorn
>
>
> diff --git a/arch/x86/pci/init.c b/arch/x86/pci/init.c
> index adb62aa..5a35638 100644
> --- a/arch/x86/pci/init.c
> +++ b/arch/x86/pci/init.c
> @@ -7,6 +7,8 @@
> in the right sequence from here. */
> static __init int pci_arch_init(void)
> {
> + struct resource *res, *conflict;
> + static struct resource cfg;
> #ifdef CONFIG_PCI_DIRECT
> int type = 0;
>
> @@ -39,6 +41,26 @@ static __init int pci_arch_init(void)
>
> dmi_check_skip_isa_align();
>
> + printk("\n** res test **\n");
> +
> + res = request_mem_region(0xa0000000, 0x10000000, "ECAM PNP");
> + printk("requested %pR\n", res);
> + if (!res)
> + return 0;
> + res->flags &= ~IORESOURCE_BUSY;
> +
> + cfg.start = 0xa0000000;
> + cfg.end = 0xafffffff;
> + cfg.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
> + cfg.name = "PCI ECAM";
> +
> + conflict = request_resource_conflict(&iomem_resource, &cfg);
> + if (conflict)
> + printk("can't claim ECAM area %pR: conflict with %s %pR\n",
> + &cfg, conflict->name, conflict);
> +
> + printk("\n");
> +
> return 0;
> }
> arch_initcall(pci_arch_init);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html