RE: [PATCH] PCI: Add information about describing PCI in ACPI
From: Gabriele Paoloni
Date: Tue Nov 22 2016 - 08:14:13 EST
Hi Bjorn
> -----Original Message-----
> From: Bjorn Helgaas [mailto:helgaas@xxxxxxxxxx]
> Sent: 21 November 2016 20:10
> To: Gabriele Paoloni
> Cc: Bjorn Helgaas; linux-pci@xxxxxxxxxxxxxxx; linux-
> acpi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-arm-
> kernel@xxxxxxxxxxxxxxxxxxx; linaro-acpi@xxxxxxxxxxxxxxxx
> Subject: Re: [PATCH] PCI: Add information about describing PCI in ACPI
>
> On Mon, Nov 21, 2016 at 05:23:11PM +0000, Gabriele Paoloni wrote:
> > Hi Bjorn
> >
> > > -----Original Message-----
> > > From: linux-pci-owner@xxxxxxxxxxxxxxx [mailto:linux-pci-
> > > owner@xxxxxxxxxxxxxxx] On Behalf Of Bjorn Helgaas
> > > Sent: 21 November 2016 16:47
> > > To: Gabriele Paoloni
> > > Cc: Bjorn Helgaas; linux-pci@xxxxxxxxxxxxxxx; linux-
> > > acpi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-arm-
> > > kernel@xxxxxxxxxxxxxxxxxxx; linaro-acpi@xxxxxxxxxxxxxxxx
> > > Subject: Re: [PATCH] PCI: Add information about describing PCI in
> ACPI
> > >
> > > On Mon, Nov 21, 2016 at 08:52:52AM +0000, Gabriele Paoloni wrote:
> > > > Hi Bjorn
> > > >
> > > > > -----Original Message-----
> > > > > From: Bjorn Helgaas [mailto:helgaas@xxxxxxxxxx]
> > > > > Sent: 18 November 2016 17:54
> > > > > To: Gabriele Paoloni
> > > > > Cc: Bjorn Helgaas; linux-pci@xxxxxxxxxxxxxxx; linux-
> > > > > acpi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-arm-
> > > > > kernel@xxxxxxxxxxxxxxxxxxx; linaro-acpi@xxxxxxxxxxxxxxxx
> > > > > Subject: Re: [PATCH] PCI: Add information about describing PCI
> in
> > > ACPI
> > > > >
> > > > > On Fri, Nov 18, 2016 at 05:17:34PM +0000, Gabriele Paoloni
> wrote:
> > > > > > > -----Original Message-----
> > > > > > > From: linux-kernel-owner@xxxxxxxxxxxxxxx [mailto:linux-
> kernel-
> > > > > > > owner@xxxxxxxxxxxxxxx] On Behalf Of Bjorn Helgaas
> > > > > > > Sent: 17 November 2016 18:00
> > > > >
> > > > > > > +Static tables like MCFG, HPET, ECDT, etc., are *not*
> > > mechanisms
> > > > > for
> > > > > > > +reserving address space! The static tables are for things
> the
> > > OS
> > > > > > > +needs to know early in boot, before it can parse the ACPI
> > > > > namespace.
> > > > > > > +If a new table is defined, an old OS needs to operate
> > > correctly
> > > > > even
> > > > > > > +though it ignores the table. _CRS allows that because it
> is
> > > > > generic
> > > > > > > +and understood by the old OS; a static table does not.
> > > > > >
> > > > > > Right so if my understanding is correct you are saying that
> > > resources
> > > > > > described in the MCFG table should also be declared in
> PNP0C02
> > > > > devices
> > > > > > so that the PNP driver can reserve these resources.
> > > > >
> > > > > Yes.
> > > > >
> > > > > > On the other side the PCI Root bridge driver should not
> reserve
> > > such
> > > > > > resources.
> > > > > >
> > > > > > Well if my understanding is correct I think we have a problem
> > > here:
> > > > > > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L74
> > > > > >
> > > > > > As you can see pci_ecam_create() will conflict with the pnp
> > > driver
> > > > > > as it will try to reserve the resources from the MCFG
> table...
> > > > > >
> > > > > > Maybe we need to rework pci_ecam_create() ?
> > > > >
> > > > > I think it's OK as it is.
> > > > >
> > > > > The pnp/system.c driver does try to reserve PNP0C02 resources,
> and
> > > it
> > > > > marks them as "not busy". That way they appear in /proc/iomem
> and
> > > > > won't be allocated for anything else, but they can still be
> > > requested
> > > > > by drivers, e.g., pci/ecam.c, which will mark them "busy".
> > > > >
> > > > > This is analogous to what the PCI core does in
> > > pci_claim_resource().
> > > > > This is really a function of the ACPI/PNP *core*, which should
> > > reserve
> > > > > all _CRS resources for all devices (not just PNP0C02 devices).
> But
> > > > > it's done by pnp/system.c, and only for PNP0C02, because
> there's a
> > > > > bunch of historical baggage there.
> > > > >
> > > > > You'll also notice that in this case, things are out of order:
> > > > > logically the pnp/system.c reservation should happen first, but
> in
> > > > > fact the pci/ecam.c request happens *before* the pnp/system.c
> one.
> > > > > That means the pnp/system.c one might fail and complain "[mem
> ...]
> > > > > could not be reserved".
> > > >
> > > > Correct me if I am wrong...
> > > >
> > > > So currently we are relying on the fact that pci_ecam_create() is
> > > called
> > > > before the pnp driver.
> > > > If the pnp driver came first we would end up in pci_ecam_create()
> > > failing
> > > > here:
> > > > http://lxr.free-electrons.com/source/drivers/pci/ecam.c#L76
> > > >
> > > > I am not sure but it seems to me like a bit weak condition to
> rely
> > > on...
> > > > what about removing the error condition in pci_ecam_create() and
> > > logging
> > > > just a dev_info()?
> > >
> > > Huh. I'm confused. I *thought* it would be safe to reverse the
> > > order, which would effectively be this:
> > >
> > > system_pnp_probe
> > > reserve_resources_of_dev
> > > reserve_range
> > > request_mem_region([mem 0xb0000000-0xb1ffffff])
> > > ...
> > > pci_ecam_create
> > > request_resource_conflict([mem 0xb0000000-0xb1ffffff])
> > >
> > >
> > > but I experimented with the patch below on qemu, and it failed as
> you
> > > predicted:
> > >
> > > ** res test **
> > > requested [mem 0xa0000000-0xafffffff]
> > > can't claim ECAM area [mem 0xa0000000-0xafffffff]: conflict with
> ECAM
> > > PNP [mem 0xa0000000-0xafffffff]
> > >
> > > I expected the request_resource_conflict() to succeed since it's
> > > completely contained in the "ECAM PNP" region. But I guess I don't
> > > understand kernel/resource.c well enough.
> >
> > I think it fails because effectively the PNP driver is populating the
> > iomem_resource resource tree and therefore pci_ecam_create() finds
> that
> > it cannot add the cfg resource to the same hierarchy as it is already
> > there...
>
> Right. I'm just surprised because the PNP reservation is marked
> "not busy", and a driver (e.g., ECAM) should still be able to request
> the resource.
Yes unfortunately pci_ecam_create() is not flexible on the conflict as
pci_request_regions():
http://lxr.free-electrons.com/source/kernel/resource.c#L1155
if the conflict resource is not busy pci_request_regions() will create
a child resource under the conflict sibling and mark it as busy...
or at least this is my understanding...
>
> > > I'm not sure we need to fix anything yet, since we currently do the
> > > ecam.c request before the system.c one, and any change there would
> be
> > > a long ways off. If/when that *does* change, I think the correct
> fix
> > > would be to change ecam.c so its request succeeds (by changing the
> way
> > > it does the request, fixing kernel/resource.c, or whatever) rather
> > > than to reduce the log level and ignore the failure.
> >
> > Well in my mind I didn't want just to make the error disappear...
> > If all the resources should be reserved by the PNP driver then
> ideally
> > we could take away request_resource_conflict() from
> pci_ecam_create(),
> > but this would make buggy some systems with an already shipped BIOS
> > that relied on pci_ecam_create() reservation rather than PNP
> reservation.
>
> I don't want remove the request from ecam.c. Ideally, there should be
> TWO lines in /proc/iomem: one from system.c for "pnp 00:01" or
> whatever it is, and a second from ecam.c. The first is the generic
> one saying "this region is consumed by a piece of hardware, so don't
> put anything else here." The second is the driver-specific one saying
> "PCI ECAM owns this region, nobody else can use it."
>
> This is the same way we handle PCI BAR resources. Here are two
> examples from my laptop. The first (00:08.0) only has one line:
> it has a BAR that consumes address space, but I don't have a driver
> for it loaded. The second (00:16.0) does have a driver loaded, so it
> has a second line showing that the driver owns the space:
>
> f124a000-f124afff : 0000:00:08.0 # from PCI core
>
> f124d000-f124dfff : 0000:00:16.0 # from PCI core
> f124d000-f124dfff : mei_me # from mei_me driver
>
> > Just removing the error condition and converting dev_err() into
> > dev_info() seems to me like accommodating already shipped BIOS images
> > and flagging a reservation that is already done by somebody else
> > without compromising the functionality of the PCI Root bridge driver
> > (so far the only reason why I can see the error condition there is
> > to catch a buggy MCFG with overlapping addresses; so if this is the
> > case maybe we need to have a different diagnostic check to make sure
> > that the MCFG table is alright)
>
> Ideally I think we should end up with this:
>
> a0000000-afffffff : pnp 00:01
> a0000000-afffffff : PCI ECAM
I think that for PCIe device drivers it works ok because it is guaranteed
that their own pci_request_regions() is called always after
pci_claim_resource() of the bridge that is on top of them...
I.e. pci_claim_resource() reserves the resources as not busy and
pci_request_regions() will create a child busy resource
>
> Realistically right now we'll probably end up with only the "PCI ECAM"
> line in /proc/iomem and a warning from system.c about not being able
> to reserve the space.
>
> If we ever change things to do the generic PNP reservation first, then
> we should fix things so ecam.c can claim the space without an error.
Maybe the patch below could be a sort of solution...effectively pci_ecam
should succeed in reserving a busy resource under the conflict resource
in case of PNP driver allocating a non BUSY resource first...
---
drivers/pci/ecam.c | 16 +++++-----------
drivers/pci/host/pci-thunder-ecam.c | 2 +-
include/linux/pci-ecam.h | 2 +-
3 files changed, 7 insertions(+), 13 deletions(-)
diff --git a/drivers/pci/ecam.c b/drivers/pci/ecam.c
index 43ed08d..999b6ef 100644
--- a/drivers/pci/ecam.c
+++ b/drivers/pci/ecam.c
@@ -66,16 +66,10 @@ struct pci_config_window *pci_ecam_create(struct device *dev,
}
bsz = 1 << ops->bus_shift;
- cfg->res.start = cfgres->start;
- cfg->res.end = cfgres->end;
- cfg->res.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- cfg->res.name = "PCI ECAM";
-
- conflict = request_resource_conflict(&iomem_resource, &cfg->res);
- if (conflict) {
+ cfg->res = request_mem_region(cfgres->start, resource_size(cfgres), "PCI ECAM");
+ if (!cfg->res) {
err = -EBUSY;
- dev_err(dev, "can't claim ECAM area %pR: address conflict with %s %pR\n",
- &cfg->res, conflict->name, conflict);
+ dev_err(dev, "can't claim ECAM area %pR\n", &cfg->res);
goto err_exit;
}
@@ -126,8 +120,8 @@ void pci_ecam_free(struct pci_config_window *cfg)
if (cfg->win)
iounmap(cfg->win);
}
- if (cfg->res.parent)
- release_resource(&cfg->res);
+ if (cfg->res->parent)
+ release_region(cfg->res->start, resource_size(cfg->res));
kfree(cfg);
}
diff --git a/drivers/pci/host/pci-thunder-ecam.c b/drivers/pci/host/pci-thunder-ecam.c
index d50a3dc..2e48d9d 100644
--- a/drivers/pci/host/pci-thunder-ecam.c
+++ b/drivers/pci/host/pci-thunder-ecam.c
@@ -117,7 +117,7 @@ static int thunder_ecam_p2_config_read(struct pci_bus *bus, unsigned int devfn,
* the config space access window. Since we are working with
* the high-order 32 bits, shift everything down by 32 bits.
*/
- node_bits = (cfg->res.start >> 32) & (1 << 12);
+ node_bits = (cfg->res->start >> 32) & (1 << 12);
v |= node_bits;
set_val(v, where, size, val);
diff --git a/include/linux/pci-ecam.h b/include/linux/pci-ecam.h
index 7adad20..f30a4ea 100644
--- a/include/linux/pci-ecam.h
+++ b/include/linux/pci-ecam.h
@@ -36,7 +36,7 @@ struct pci_ecam_ops {
* use ECAM.
*/
struct pci_config_window {
- struct resource res;
+ struct resource *res;
struct resource busr;
void *priv;
struct pci_ecam_ops *ops;
--
2.7.4