Re: [PATCH] xen/pci: try to reserve MCFG areas earlier

From: Boris Ostrovsky
Date: Mon Sep 09 2019 - 15:20:21 EST


On 9/8/19 7:37 PM, Igor Druzhinin wrote:
> On 09/09/2019 00:30, Boris Ostrovsky wrote:
>> On 9/8/19 5:11 PM, Igor Druzhinin wrote:
>>> On 08/09/2019 19:28, Boris Ostrovsky wrote:
>>>> On 9/6/19 7:00 PM, Igor Druzhinin wrote:
>>>>> On 06/09/2019 23:30, Boris Ostrovsky wrote:
>>>>>> Where is MCFG parsed? pci_arch_init()?
>>>>>>> It happens twice:
>>>>> 1) first time early one in pci_arch_init() that is arch_initcall - that
>>>>> time pci_mmcfg_list will be freed immediately there because MCFG area is
>>>>> not reserved in E820;
>>>>> 2) second time late one in acpi_init() which is subsystem_initcall right
>>>>> before where PCI enumeration starts - this time ACPI tables will be
>>>>> checked for a reserved resource and pci_mmcfg_list will be finally
>>>>> populated.
>>>>>
>>>>> The problem is that on a system that doesn't have MCFG area reserved in
>>>>> E820 pci_mmcfg_list is empty before acpi_init() and our PCI hooks are
>>>>> called in the same place. So MCFG is still not in use by Xen at this
>>>>> point since we haven't reached our xen_mcfg_late().
>>>> Would it be possible for us to parse MCFG ourselves in pci_xen_init()? I
>>>> realize that we'd be doing this twice (or maybe even three times since
>>>> apparently both pci_arch_init()Â and acpi_ini() do it).
>>>>
>>> I don't thine it makes sense:
>>> a) it needs to be done after ACPI is initialized since we need to parse
>>> it to figure out the exact reserved region - that's why it's currently
>>> done in acpi_init() (see commit message for the reasons why)
>> Hmm... We should be able to parse ACPI tables by the time
>> pci_arch_init() is called. In fact, if you look at
>> pci_mmcfg_early_init() you will see that it does just that.
>>
> The point is not to parse MCFG after acpi_init but to parse DSDT for
> reserved resource which could be done only after ACPI initialization.

OK, I think I understand now what you are trying to do --- you are
essentially trying to account for the range inserted by
setup_mcfg_map(), right?

The other question I have is why you think it's worth keeping
xen_mcfg_late() as a late initcall. How could MCFG info be updated
between acpi_init() and late_initcalls being run? I'd think it can only
happen when a new device is hotplugged.

-boris

>
>>> b) given (a) we cannot do it ourselves before acpi_init and after is too
>>> late as we're already past ACPI PCI enumeration
>>> c) we'd have to do it in the same place I call xen_mcfg_late() and it'd
>>> be code duplication of what's already done by the existing code.
>>
>> If we manage to parse MCFG ourselves early then maybe we won't not need
>> xen_mcfg_late()? We can call PHYSDEVOP_pci_mmcfg_reserved right away.
> Again, this cannot be done untile acpi_init finishes basic setup to
> parse DSDT.
>
> Igor