Re: [PATCH v1 4/4] PCI: Allow extend_bridge_window() to shrink resource if necessary

From: Nicholas Johnson
Date: Tue Jan 07 2020 - 20:36:13 EST


On Tue, Jan 07, 2020 at 02:34:35PM -0600, Bjorn Helgaas wrote:
> On Mon, Jan 06, 2020 at 03:48:06PM +0000, Nicholas Johnson wrote:
> > Remove checks for resource size in extend_bridge_window(). This is
> > necessary to allow the pci_bus_distribute_available_resources() to
> > function when the kernel parameter pci=hpmemsize=nn[KMG] is used to
> > allocate resources. Because the kernel parameter sets the size of all
> > hotplug bridges to be the same, there are problems when nested hotplug
> > bridges are encountered. Fitting a downstream hotplug bridge with size X
> > and normal bridges with non-zero size Y into parent hotplug bridge with
> > size X is impossible, and hence the downstream hotplug bridge needs to
> > shrink to fit into its parent.
>
> s/extend_bridge_window()/adjust_bridge_window()/ above
> s/to allow the/to allow/

Okay
>
> If this patch allows pci_bus_distribute_available_resources() to
> function when pci=hpmemsize=nn is used, what happens *before* this
> patch? The text implies that pci_bus_distribute_available_resources()
> doesn't function, but what happens? Do we try to assign a downstream
> bridge requiring X+n inside an upstream window of size X and the
> assignment fails, leaving the downstream bridge unusable?

I could add something similar to this to the log:

The hpmemsize is applied to add_size of every hotplug bridge, even
nested ones. Say we set hpmemsize=256M, the upstream hotplug bridge gets
256M. Then when we hot-add a Thunderbolt device with daisy chaining, the
new nested bridge also gets 256M and this will not fit because some
further space has been consumed by the endpoints in the Thunderbolt
device. Hence, we cannot extend.

It works for Mika because he is interested in the cases when the
firmware assigns the resources, hence hpmemsize=2M (default) and it does
not cause problems, unless we run out of space and need to go below 2M.
>
> > Add check for if bridge is extended or shrunken and reflect that in the
> > call to pci_dbg().
> >
> > Reset the resource if its new size is zero (if we have run out of a
> > bridge window resource) to prevent the PCI resource assignment code from
> > attempting to assign a zero-sized resource.
> >
> > Signed-off-by: Nicholas Johnson <nicholas.johnson-opensource@xxxxxxxxxxxxxx>
> > ---
> > drivers/pci/setup-bus.c | 17 ++++++++++++-----
> > 1 file changed, 12 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
> > index 0c51f4937..e7e57bf72 100644
> > --- a/drivers/pci/setup-bus.c
> > +++ b/drivers/pci/setup-bus.c
> > @@ -1836,18 +1836,25 @@ static void adjust_bridge_window(struct pci_dev *bridge, struct resource *res,
> > struct list_head *add_list,
> > resource_size_t new_size)
> > {
> > - resource_size_t add_size;
> > + resource_size_t add_size, size = resource_size(res);
> >
> > if (res->parent)
> > return;
> >
> > - if (resource_size(res) >= new_size)
> > - return;
> > + if (new_size > size) {
> > + add_size = new_size - size;
> > + pci_dbg(bridge, "bridge window %pR extended by %pa\n", res,
> > + &add_size);
> > + } else if (new_size < size) {
> > + add_size = size - new_size;
> > + pci_dbg(bridge, "bridge window %pR shrunken by %pa\n", res,
> > + &add_size);
> > + }
>
> Where's the patch that changes the caller so "new_size" may be smaller
> than "size"? I guess it must be "[3/3] PCI: Consider alignment of
> hot-added bridges ..." because that's the only one that makes a
> non-trivial change, right?

As above, there was always a possibility of the new_size being smaller.
For some reason, 1M is assigned to bridges, even if nothing is below
them (for example, unused non hotplug bridges in a Thunderbolt dock). It
may be an edge case if we are low on space, but theoretically it can
happen.

Also, when writing this, Mika was not interested in using hpmemsize,
which, when used, will cause new_size to be smaller than the current
size (actual size and add_size combined).

So it does not need a patch to cause "new_size" to be smaller than
"size" - just a change in user behaviour to use pci=hpmemsize.
>
> > - add_size = new_size - resource_size(res);
> > - pci_dbg(bridge, "bridge window %pR extended by %pa\n", res, &add_size);
> > res->end = res->start + new_size - 1;
> > remove_from_list(add_list, res);
> > + if (!new_size)
> > + reset_resource(res);
>
> I consider reset_resource() to be deprecated because it throws away
> res->flags, which tells us what kind of resource it is
> (mem/io/32-bit/64-bit/prefetchable). We learn this during
> enumeration, and we shouldn't forget the information until we remove
> the device.

I will look at this, but I distinctly remember doing this because of IO
BARs which would run out and cause the device not to be enabled.

Can you please comment on IORESOURCE_UNSET and what effect that would
have if applied to the flags. Also, can you suggest any other solution
other than making it handle zero-sized resources better? I agree, that
would be ideal to make it handle zero-sized resources, but given the
state of drivers/pci/setup-bus.c, I feel like it will just recursively
open up more cans of worms until we metaphorically stack overflow. And I
am happy to go down that path. But at some point I feel we will have to
make some compromises / stop-gap measures to apply some patches and make
progress, before going down that road. Would you agree?
>
> If the resource assignment code doesn't do the right thing with a
> zero-sized resource, I think we should fix that code. Clearing the
> resource struct does nothing with the hardware BAR or window
> registers, so the BAR/window remains enabled unless we do something
> more. If we don't need a window and we want to disable it, we can do
> that, but it requires writing special values to the hardware
> registers.
>
> Bjorn

https://lkml.org/lkml/2020/1/7/1544

You describe this as "black magic code", what appears to be the
assignment code which handles lists of resources. And I agree. I believe
it is in both our interests to avoid using add_size because nobody
understands how these are handled. There may be bugs, and there is
definitely lots of complexity involved. I believe simplicity is key.
Hence why these changes in this series:

- Change resource size directly instead of using add_size

- Aside from the currently known cases of needing to shrink the
resource, we cannot know that there will not be more cases of this in
the future. There is no need for preventing it from shrinking - we have
an available size for the bridge window, and if that happens to be
smaller than the bridge window, we have no choice but to shrink. I
believe this makes the check unnecessary and warrants removal.

Regards,
Nicholas