Re: [PATCH v2] x86/PCI: Prefer MMIO over PIO on all hypervisor

From: Nadav Amit
Date: Mon Oct 10 2022 - 10:58:41 EST


On Oct 4, 2022, at 11:48 AM, Nadav Amit <namit@xxxxxxxxxx> wrote:

> On Oct 4, 2022, at 1:22 AM, Alexander Graf <graf@xxxxxxxxxx> wrote:
>
>> ⚠ External Email
>>
>> Hey Nadav,
>>
>> On 03.10.22 19:34, Nadav Amit wrote:
>>> On Oct 3, 2022, at 8:03 AM, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
>>>
>>>> Not my but rather PCI maintainer's call but IMHO dropping 'const' is
>>>> better, introducing a new global var is our 'last resort' and should be
>>>> avoided whenever possible. Alternatively, you can add a
>>>> raw_pci_ext_ops_preferred() function checking somethin within 'struct
>>>> hypervisor_x86' but I'm unsure if it's better.
>>>>
>>>> Also, please check Alex' question/suggestion.
>>> Here is my take (and Ajay knows probably more than me):
>>>
>>> Looking briefly on MCFG, I do not see a clean way of using the ACPI table.
>>> The two options are either to use a reserved field (which who knows, might
>>> be used one day) or some OEM ID. I am also not familiar with
>>> PCI_COMMAND.MEMORY=0, so Ajay can hopefully give some answer about that.
>>>
>>> Anyhow, I understand (although not relate) to the objection for a new global
>>> variable. How about explicitly calling this hardware bug a “bug” and using
>>> the proper infrastructure? Calling it explicitly a bug may even push whoever
>>> can to resolve it.
>>
>>
>> I am a lot more concerned with how we propagate it externally than
>> within Linux. If we hard code that all Linux kernels 6.2+ that are
>> running in VMware prefer ECAM over PIO, we lock ourselves into that
>> stance for better or worse, which means:
>>
>> * All past and future versions of any VMware hypervisor product have to
>> always allow ECAM access for any PCIe config space write
>> * No other hypervisor benefits from any of this without upstream code change
>> * No real hardware platform benefits from this without upstream code change
>>
>> By moving it into MCFG, we can create a path for the outside environment
>> to tell the OS whether it's safe to use ECAM always. This obviously
>> doesn't work with MCFG as it stands today, we'd have to propose an MCFG
>> spec change to the PCI SIG's "PCI Firmware Specification" to add the
>> respective field. Future VMware versions could then always expose the
>> flag - and if you find it broken, remove it again.
>>
>> Putting all of the logic on which system potentially prefers ECAM over
>> PIO config space access into Linux is just a big hack that we should
>> avoid as much as possible.
>
> Thanks Alex. You raise important points. Let me try to break down your
> concerns slightly differently:
>
> 1. Enabling MMIO access should be selective, and potentially controlled by
> the hypervisor. The very least a "chicken-bit” is needed.
>
> 2. PCI SIG would change its specifications to address unclear hardware bug.
>
> I think (1) makes sense and we can discuss different ways of addressing it.
> But (2) would not happen in a reasonable timeline and seems to me as an
> unnecessary complication.
>
> But before we discuss how to address the issue, perhaps we need to first
> understand it better. I am not sure that I understand this MMIO bug, and so
> far nobody was able to provide exact details.
>
> So I went to have a look. It might not be super helpful, but for the record,
> here is what I collected.
>
> First, we have commit d6ece5491ae71d ("i386/x86-64 Correct for broken MCFG
> tables on K8 systems”). It tried to "try to discover all devices on bus 0
> that are unreachable using MM and fallback for them.” Interestingly, it
> seems similar to FreeBSD code (commit 2d10570afe2b3e) that also mentions K8
> and has similar detection logic in FreeBSD’s pcie_cfgregopen().
>
> Then commit a0ca9909609470 ("PCI x86: always use conf1 to access config
> space below 256 bytes”). The correspondence [1] mentions some bugs: ATI
> chipset, VIA chipset, Intel 3 Series Express chipset family and some reports
> on Nvidia. It turned out some devices had problem probing - to figure out if
> MMIO is broken - the way the previous patch did.
>
> All of these bugs are circa 2008, of course. And note that FreeBSD did not
> take a similar path. The correspondence around Linux patch is endless. I
> admit that I did not understand whether eventually the issues were found to
> be per-bus or per-device.
>
>
> Back to the matter at hand. The benefit of using the MCFG approach that you
> propose is that it can enable native systems to use MMIO as well. However,
> since the list of bugs is unclear and the problems might be device-specific,
> it is not clear what information BIOSes have that Linux doesn’t. In other
> words, the benefit of getting it into the specifications is questionable,
> and the complexity+time is high.
>
> Can we agree that the feature would be enabled explicitly by the hypervisor
> and Linux would enable it based on the hypervisor input (through some
> channel?)

Alex, is it ok with you? We will enable the feature (disable the bug)
explicitly from the hypervisor, but would not rely on MCFG changes, which
would even in the best case would take some time.