Re: [PATCH 1/3] x86/quirks: Scan all busses for early PCI quirks

From: Bjorn Helgaas
Date: Mon Nov 16 2020 - 19:19:12 EST


On Mon, Nov 16, 2020 at 05:31:36PM -0300, Guilherme G. Piccoli wrote:
> First of all, thanks everybody for the great insights/discussion! This
> thread ended-up being a great learning for (at least) me.
>
> Given the big amount of replies and intermixed comments, I wouldn't be
> able to respond inline to all, so I'll try another approach below.
>
>
> From Bjorn:
> "I think [0] proposes using early_quirks() to disable MSIs at boot-time.
> That doesn't seem like a robust solution because (a) the problem affects
> all arches but early_quirks() is x86-specific and (b) even on x86
> early_quirks() only works for PCI segment 0 because it relies on the
> 0xCF8/0xCFC I/O ports."
>
> Ah. I wasn't aware of that limitation, I thought enhancing the
> early_quirks() search to go through all buses would fix that, thanks for
> the clarification! And again, worth to clarify that this is not a
> problem affecting all arches _in practice_ - PowerPC for example has the
> FW primitives allowing a powerful PCI controller (out-of-band) reset,
> preventing this kind of issue usually.
>
> [0]
> https://lore.kernel.org/linux-pci/20181018183721.27467-1-gpiccoli@xxxxxxxxxxxxx
>
>
> From Bjorn:
> "A crash_device_shutdown() could do something at the host bridge level
> if that's possible, or reset/disable bus mastering/disable MSI/etc on
> individual PCI devices if necessary."
>
> From Lukas:
> "Guilherme's original patches from 2018 iterate over all 256 PCI buses.
> That might impact boot time negatively. The reason he has to do that is
> because the crashing kernel doesn't know which devices exist and which
> have interrupts enabled. However the crashing kernel has that
> information. It should either disable interrupts itself or pass the
> necessary information to the crashing kernel as setup_data or whatever.

I don't think passing the device information to the kdump kernel is
really practical. The kdump kernel would use it to do PCI config
writes to disable MSIs before enabling IRQs, and it doesn't know how
to access config space that early.

We could invent special "early config access" things, but that gets
really complicated really fast. Config access depends on ACPI MCFG
tables, firmware interfaces, and in many cases, on the native host
bridge drivers in drivers/pci/controllers/.

I think we need to disable MSIs in the crashing kernel before the
kexec. It adds a little more code in the crash_kexec() path, but it
seems like a worthwhile tradeoff.

Bjorn