Re: [PATCH 0/4] PCI: Continue E820 vs host bridge window saga

From: Hans de Goede
Date: Thu Dec 08 2022 - 15:20:27 EST


Hi,

On 12/8/22 21:06, Bjorn Helgaas wrote:
> On Thu, Dec 08, 2022 at 08:16:31PM +0100, Hans de Goede wrote:
>> Hi Bjorn,
>>
>> On 12/8/22 19:57, Bjorn Helgaas wrote:
>>> On Wed, Dec 07, 2022 at 04:31:12PM +0100, Hans de Goede wrote:
>>>> On 12/4/22 10:13, Hans de Goede wrote:
>>>>
>>>> <snip>
>>>>
>>>>>>> 2. I am afraid that now allowing PCI MMIO space to be allocated
>>>>>>> in regions marked as EfiMemoryMappedIO will cause regressions
>>>>>>> on some systems. Specifically when I tried something similar
>>>>>>> the last time I looked at this (using the BIOS date cut-off
>>>>>>> approach IIRC) there was a suspend/resume regression on
>>>>>>> a Lenovo ThinkPad X1 carbon (20A7) model:
>>>>>>>
>>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=2029207
>>>>>>>
>>>>>>> Back then I came to the conclusion that the problem is that not
>>>>>>> avoiding the EfiMemoryMappedIO regions caused PCI MMIO space to
>>>>>>> be allocated in the 0xdfa00000 - 0xdfa10000 range which is
>>>>>>> listed in the EFI memmap as:
>>>>>>>
>>>>>>> [ 0.000000] efi: mem46: [MMIO |RUN| | | | | | | | | | | | | ] range=[0x00000000dfa00000-0x00000000dfa0ffff] (0MB)
>>>>>>>
>>>>>>> And with current kernels with the extra logging added for this
>>>>>>> the following is logged related to this:
>>>>>>>
>>>>>>> [ 0.326504] acpi PNP0A08:00: clipped [mem 0xdfa00000-0xfebfffff window] to [mem 0xdfa10000-0xfebfffff window] for e820 entry [mem 0xdceff000-0xdfa0ffff]
>>>>>>>
>>>>>>> I believe patch 1/4 of this set will make this clipping go away,
>>>>>>> re-introducing the suspend/resume problem.
>>>>>>
>>>>>> Yes, I'm afraid you're right. Comparing the logs at comment #31
>>>>>> (fails) and comment #38 (works):
>>>>>>
>>>>>> pci_bus 0000:00: root bus resource [mem 0xdfa00000-0xfebfffff window]
>>>>>> pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfa00000-0xdfbfffff] fails
>>>>>> pci 0000:00:1c.0: BAR 14: assigned [mem 0xdfb00000-0xdfcfffff] works
>>>>>>
>>>>>> Since 0xdfa00000 is included in the host bridge _CRS, but isn't
>>>>>> usable, my guess is this is a _CRS bug.
>>>>>
>>>>> Ack.
>>>>>
>>>>> So I was thinking to maybe limit the removal of EfiMemoryMappedIO
>>>>> regions from the E820 map if they are big enough to cause troubles?
>>>>>
>>>>> Looking at the EFI map MMIO regions on this Lenovo ThinkPad X1 carbon
>>>>> (20A7) model, they are tiny. Where as the ones which we know cause
>>>>> problems are huge. So maybe add a bit of heuristics to patch 1/4 based
>>>>> on the EfiMemoryMappedIO region size and only remove the big ones
>>>>> from the E820 map ?
>>>>>
>>>>> I know that adding heuristics like this always feels a bit wrong,
>>>>> because you end up putting a somewhat arbitrary cut off point in
>>>>> the code on which to toggle behavior on/off, but I think that in
>>>>> this case it should work nicely given how huge the EfiMemoryMappedIO
>>>>> regions which are actually causing problems are.
>>>
>>> I'll post a v2 that removes only regions 256KB or larger in a minute.
>>
>> Ok, may I ask why 256KB?
>>
>> I see that that rules out then troublesome MMIO regions from the X1 carbon from:
>> https://bugzilla.redhat.com/show_bug.cgi?id=2029207 :
>> efi: mem46: [MMIO|RUN| ] range=[0xdfa00000-0xdfa0ffff] (0MB) [64K]
>> which we know we need to avoid / keep reserved.
>>
>> But OTOH the reservations which are causing the problems with assigning
>> resources to PCI devices by Linux look like this:
>> efi: mem50: [MMIO |RUN| | | | | | | | | | | | |UC] range=[0x0000000065400000-0x00000000cfffffff] (1708MB)
>> which is significantly larger then 256KB.
>>
>> So we could e.g. also put the cut-off point at 16MB and still
>> remove the above troublesome reservation from the E820 table.
>> Note just thinking out loud here. I have no idea if 16MB
>> would be better...
>
> No good reason for 256KB. We know it needs to be at least 64KB for
> the X1 Carbon. I picked 4x bigger just for headroom, since I assume
> the 64KB is platform-specific host bridge registers or something. Do
> you think a bigger number would be better, i.e., we would retain more
> MMIO things in E820?
>
> ECAM areas would be 1MB per bus, so between 1MB and 256MB. Those areas
> *should* be reserved by PNP0C02 _CRS, but IIRC the early MMCONFIG code
> checks E820, and the late code checks for _CRS. I guess one could
> argue that ignoring those, e.g., by retaining anything 256MB or
> smaller in E820, would reduce the amount of change.

Right, reducing how much we change what the E820 map looks like after
this would be the main reason to make the cut of point bigger then
256KB.

> But if the host bridge _CRS includes 256MB of legitimate window that
> EFI says is MMIO and is hence included in E820, that seems like kind
> of a lot of usable window space to give up.

Ack, I guess we can just go with 256KB for now and then see how things go.

Regards,

Hans