Re: [PATCH] xen/pci: try to reserve MCFG areas earlier

From: Boris Ostrovsky
Date: Sun Sep 08 2019 - 14:29:06 EST


On 9/6/19 7:00 PM, Igor Druzhinin wrote:
>
> On 06/09/2019 23:30, Boris Ostrovsky wrote:
>> On 9/3/19 8:20 PM, Igor Druzhinin wrote:
>>> If MCFG area is not reserved in E820, Xen by default will defer its usage
>>> until Dom0 registers it explicitly after ACPI parser recognizes it as
>>> a reserved resource in DSDT. Having it reserved in E820 is not
>>> mandatory according to "PCI Firmware Specification, rev 3.2" (par. 4.1.2)
>>> and firmware is free to keep a hole E820 in that place. Xen doesn't know
>>> what exactly is inside this hole since it lacks full ACPI view of the
>>> platform therefore it's potentially harmful to access MCFG region
>>> without additional checks as some machines are known to provide
>>> inconsistent information on the size of the region.
>>>
>>> Now xen_mcfg_late() runs after acpi_init() which is too late as some basic
>>> PCI enumeration starts exactly there. Trying to register a device prior
>>> to MCFG reservation causes multiple problems with PCIe extended
>>> capability initializations in Xen (e.g. SR-IOV VF BAR sizing). There are
>>> no convenient hooks for us to subscribe to so try to register MCFG
>>> areas earlier upon the first invocation of xen_add_device().
>>
>> Where is MCFG parsed? pci_arch_init()?
> It happens twice:
> 1) first time early one in pci_arch_init() that is arch_initcall - that
> time pci_mmcfg_list will be freed immediately there because MCFG area is
> not reserved in E820;
> 2) second time late one in acpi_init() which is subsystem_initcall right
> before where PCI enumeration starts - this time ACPI tables will be
> checked for a reserved resource and pci_mmcfg_list will be finally
> populated.
>
> The problem is that on a system that doesn't have MCFG area reserved in
> E820 pci_mmcfg_list is empty before acpi_init() and our PCI hooks are
> called in the same place. So MCFG is still not in use by Xen at this
> point since we haven't reached our xen_mcfg_late().


Would it be possible for us to parse MCFG ourselves in pci_xen_init()? I
realize that we'd be doing this twice (or maybe even three times since
apparently both pci_arch_init()Â and acpi_ini() do it).

-boris