Re: [PATCH v3 1/3] PCI: Introduce pcibios_ignore_alignment_request

From: Shawn Anastasio
Date: Thu May 30 2019 - 18:53:13 EST


On 5/29/19 10:39 PM, Alexey Kardashevskiy wrote:


On 28/05/2019 17:39, Shawn Anastasio wrote:


On 5/28/19 1:27 AM, Alexey Kardashevskiy wrote:


On 28/05/2019 15:36, Oliver wrote:
On Tue, May 28, 2019 at 2:03 PM Shawn Anastasio <shawn@xxxxxxxxxx>
wrote:

Introduce a new pcibios function pcibios_ignore_alignment_request
which allows the PCI core to defer to platform-specific code to
determine whether or not to ignore alignment requests for PCI
resources.

The existing behavior is to simply ignore alignment requests when
PCI_PROBE_ONLY is set. This is behavior is maintained by the
default implementation of pcibios_ignore_alignment_request.

Signed-off-by: Shawn Anastasio <shawn@xxxxxxxxxx>
---
 drivers/pci/pci.c | 9 +++++++--
 include/linux/pci.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843b1615..8207a09085d1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5882,6 +5882,11 @@ resource_size_t __weak
pcibios_default_alignment(void)
ÂÂÂÂÂÂÂÂ return 0;
 }

+int __weak pcibios_ignore_alignment_request(void)
+{
+ÂÂÂÂÂÂ return pci_has_flag(PCI_PROBE_ONLY);
+}
+
 #define RESOURCE_ALIGNMENT_PARAM_SIZE COMMAND_LINE_SIZE
 static char
resource_alignment_param[RESOURCE_ALIGNMENT_PARAM_SIZE] = {0};
 static DEFINE_SPINLOCK(resource_alignment_lock);
@@ -5906,9 +5911,9 @@ static resource_size_t
pci_specified_resource_alignment(struct pci_dev *dev,
ÂÂÂÂÂÂÂÂ p = resource_alignment_param;
ÂÂÂÂÂÂÂÂ if (!*p && !align)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto out;
-ÂÂÂÂÂÂ if (pci_has_flag(PCI_PROBE_ONLY)) {
+ÂÂÂÂÂÂ if (pcibios_ignore_alignment_request()) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ align = 0;
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ pr_info_once("PCI: Ignoring requested alignments
(PCI_PROBE_ONLY)\n");
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ pr_info_once("PCI: Ignoring requested alignments\n");
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto out;
ÂÂÂÂÂÂÂÂ }

I think the logic here is questionable to begin with. If the user has
explicitly requested re-aligning a resource via the command line then
we should probably do it even if PCI_PROBE_ONLY is set. When it breaks
they get to keep the pieces.

That said, the real issue here is that PCI_PROBE_ONLY probably
shouldn't be set under qemu/kvm. Under the other hypervisor (PowerVM)
hotplugged devices are configured by firmware before it's passed to
the guest and we need to keep the FW assignments otherwise things
break. QEMU however doesn't do any BAR assignments and relies on that
being handled by the guest. At boot time this is done by SLOF, but
Linux only keeps SLOF around until it's extracted the device-tree.
Once that's done SLOF gets blown away and the kernel needs to do it's
own BAR assignments. I'm guessing there's a hack in there to make it
work today, but it's a little surprising that it works at all...


The hack is to run a modified qemu-aware "/usr/sbin/rtas_errd" in the
guest which receives an event from qemu (RAS_EPOW from
/proc/interrupts), fetches device tree chunks (and as I understand it -
they come with BARs from phyp but without from qemu) and writes "1" to
"/sys/bus/pci/rescan" which calls pci_assign_resource() eventually:

Interesting. Does this mean that the PHYP hotplug path doesn't
call pci_assign_resource?


I'd expect dlpar_add_slot() to be called under phyp and eventually
pci_device_add() which (I think) may or may not trigger later reassignment.


If so it means the patch may not
break that platform after all, though it still may not be
the correct way of doing things.


We should probably stop enforcing the PCI_PROBE_ONLY flag - it seems
that (unless resource_alignment= is used) the pseries guest should just
walk through all allocated resources and leave them unchanged.

If we add a pcibios_default_alignment() implementation like was
suggested earlier, then it will behave as if the user has
specified resource_alignment= by default and SLOF's assignments
won't be honored (I think).

I guess it boils down to one question - is it important that we
observe SLOF's initial BAR assignments? If not, the device tree
modification that Sam found would work fine here. Otherwise,
we need a way to honor the initial assignments from SLOF while
still allowing custom alignments for hotplugged devices, either
by deferring to the platform code like I do here, unsetting
PCI_PROBE_ONLY in certain cases or by using IORESOURCE_PCI_FIXED
like Bjorn suggested.



[c000000006e6f960] [c0000000005f62d4] pci_assign_resource+0x44/0x360

[c000000006e6fa10] [c0000000005f8b54]
assign_requested_resources_sorted+0x84/0x110
[c000000006e6fa60] [c0000000005f9540]
__assign_resources_sorted+0xd0/0x750
[c000000006e6fb40] [c0000000005fb2e0]
__pci_bus_assign_resources+0x80/0x280
[c000000006e6fc00] [c0000000005fb95c]
pci_assign_unassigned_bus_resources+0xbc/0x100
[c000000006e6fc60] [c0000000005e3d74] pci_rescan_bus+0x34/0x60

[c000000006e6fc90] [c0000000005f1ef4] rescan_store+0x84/0xc0

[c000000006e6fcd0] [c00000000068060c] bus_attr_store+0x3c/0x60

[c000000006e6fcf0] [c00000000037853c] sysfs_kf_write+0x5c/0x80






IIRC Sam Bobroff was looking at hotplug under pseries recently so he
might have something to add. He's sick at the moment, but I'll ask him
to take a look at this once he's back among the living

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4a5a84d7bdd4..47471dcdbaf9 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1990,6 +1990,7 @@ static inline void
pcibios_penalize_isa_irq(int irq, int active) {}
 int pcibios_alloc_irq(struct pci_dev *dev);
 void pcibios_free_irq(struct pci_dev *dev);
 resource_size_t pcibios_default_alignment(void);
+int pcibios_ignore_alignment_request(void);

 #ifdef CONFIG_HIBERNATE_CALLBACKS
 extern struct dev_pm_ops pcibios_pm_ops;
--
2.20.1