Re: [Bugfix] x86/PCI/ACPI: Fix regression caused by commit 63f1789ec716

From: Bjorn Helgaas
Date: Tue Mar 24 2015 - 09:19:34 EST


On Mon, Mar 23, 2015 at 9:59 PM, Jiang Liu <jiang.liu@xxxxxxxxxxxxxxx> wrote:
> On 2015/3/24 10:42, Bjorn Helgaas wrote:
>> On Mon, Mar 23, 2015 at 9:22 PM, Jiang Liu <jiang.liu@xxxxxxxxxxxxxxx> wrote:
>>> On 2015/3/24 0:48, Bjorn Helgaas wrote:
>>>> On Mon, Mar 23, 2015 at 03:22:14PM +0800, Jiang Liu wrote:
>>>>> Commit 63f1789ec716("Ignore resources consumed by host bridge itself")
>>>>> tries to ignore resources consumed by PCI host bridge itself by
>>>>> checking IORESOURCE_WINDOW flag, which causes regression on some
>>>>> platforms.
>>>>
>>>> "Do. Or do not. There is no try."
>>>> [http://www.starwars.com/video/do-or-do-not]
>>>>
>>>> That commit doesn't *try* to do something. It *does* something. Just
>>>> explain what it does and what's wrong with what it does.
>>>>
>>>>> For example, PC Engines APU.1C platform defines PCI MMIO resources with
>>>>> ACPI Memory32Fixed operator as below:
>>>>> Name (CRES, ResourceTemplate ()
>>>>> {
>>>>> ...
>>>>> WordIO (ResourceProducer, MinFixed, MaxFixed, PosDecode,
>>>>> 0x0000, // Granularity
>>>>> 0x0D00, // Range Minimum
>>>>> 0xFFFF, // Range Maximum
>>>>> 0x0000, // Translation Offset
>>>>> 0xF300, // Length
>>>>> ,, , TypeStatic)
>>>>> Memory32Fixed (ReadOnly,
>>>>> 0x000A0000, // Address Base
>>>>> 0x00020000, // Address Length
>>>>> )
>>>>> Memory32Fixed (ReadOnly,
>>>>> 0x00000000, // Address Base
>>>>> 0x00000000, // Address Length
>>>>> _Y00)
>>>>> })
>>>>>
>>>>> Memory32Fixed operator doesn't support concept of "producer/consumer"
>>>>> and it will be treated as "consumer" by the ACPI resource parsing
>>>>> interface, thus cause regression. So the fix is only to check
>>>>> "producer/consumer" flag for resources having "producer/consumer" flag.
>>>>
>>>> Apparently the problem is with the Memory32Fixed resources above; it sounds
>>>> like we ignore them after 63f1789ec716? I don't quite understand how this
>>>> fix works. acpi_dev_filter_resource_type() has cases for both
>>>> ACPI_RESOURCE_TYPE_FIXED_MEMORY32 and ACPI_RESOURCE_TYPE_ADDRESSxx, but
>>>> this patch only touches the latter, not the
>>>> ACPI_RESOURCE_TYPE_FIXED_MEMORY32 case.
>>> The idea is:
>>> 1) caller specifies IORESOURCE_WINDOW to query resources provided
>>> by the device, otherwise it's querying resources consumed by
>>> the device.
>>> 2) For resource descriptors having producer/consumer flag, such as
>>> ACPI_RESOURCE_TYPE_ADDRESSxx, we check the producer/consumer flag.
>>> 3) For resource descriptors not having producer/consumer flag, such
>>> as ACPI_RESOURCE_TYPE_FIXED_MEMORY32, we skip checking the
>>> producer/consumer flag.
>>
>> I figured out that much by reading the code. But I think the code is
>> very hard to read, and I still don't really understand how it works.
>>
>> Before this fix, we ignore Memory32Fixed resources. After this fix,
>> we use Memory32Fixed as a window. I think that's an incorrect
>> interpretation of Memory32Fixed.
> Hi Bjorn,
> I think it's illegal to use Memory32Fixed for PCI host
> bridge resource window too. The fix is to solve the regression
> by relaxing constraints, but that may not be the right solution.

Relaxing the constraint for all platforms is only a solution if we're
confident that it works for all platforms. I'm not confident because
I don't know what effect this patch has on systems that use
Memory32Fixed correctly.

It's possible that you can analyze the behavior and explain that if
you relax the constraint, the Memory32Fixed handling (both for bridge
and non-bridge devices) will be the same as it was before
63f1789ec716. If you can do that, I think it would be reasonable
because you'd be preserving the previous, known-working behavior.

If you go that route, I'd like to see a patch that explicitly touches
the Memory32Fixed handling. The current patch claims to change the
way we handle Memory32Fixed, but nothing in the code change is
directly related to Memory32Fixed, so it's very confusing.

> How about waiting for a while to see whether there are more
> bug reports related to this. If only limited platform affected,
> we could treat it as BIOS bugs and use quirk to handle it.
> Otherwise we may need to relax the constraint.

I don't think waiting is a good strategy. We know we have a
regression, and I think we need to fix that ASAP without waiting for
more failure reports.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/