Re: [BUG] Bisected Problem with LSI PCI FC Adapter

From: Bjorn Helgaas
Date: Mon Sep 22 2014 - 11:23:41 EST


On Mon, Sep 22, 2014 at 8:53 AM, Andreas Noever
<andreas.noever@xxxxxxxxx> wrote:
> On Mon, Sep 22, 2014 at 4:25 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
>> On Sat, Sep 20, 2014 at 12:41 PM, Dirk Gouders <dirk@xxxxxxxxxxx> wrote:
>>> Bjorn Helgaas <bhelgaas@xxxxxxxxxx> writes:
>>>
>>>> On Sat, Sep 13, 2014 at 09:41:34PM +0200, Dirk Gouders wrote:
>>>>> So, I did some tests on the VX50 which probably wasn't the worst idea,
>>>>> because it behaves different than the test machine.
>>>>>
>>>>> Summary:
>>>>>
>>>>> 1) Bjorn's back pocket patch works on the VX50.
>>>>>
>>>>> On the test machine it causes a trace, mount_root has to do with
>>>>> it. I tried to use netconsole but it complained the interface were
>>>>> not ready.
>>>>
>>>> OK, that's good. I put this revert patch in for-linus for v3.17. I regard
>>>> this as a temporary fix, not the real solution. My guess is the test
>>>> machine doesn't boot because you're missing a driver, so not related to the
>>>> revert patch.
>>>
>>> I assumed my limit-host-bridge-aperture-and-ignore-bridges-patch on top
>>> of your patch caused this, so I took a closer look.
>>>
>>> Your patch works fine with current rc5+ on the test machine -- with and
>>> without my additional patch.
>>
>> Great, thanks for testing that!
>>
>>> Other various today's test results (VX50) will be appended to bugzilla
>>> in a few moments.
>>
>> The Windows Server 2008 boot shows that Windows reconfigures the
>> 00:0e.0 bridge so it fits inside the [bus 00-07] aperture reported by
>> the host bridge _CRS, and the LSI FC adapter is not enumerated at all.
>> That's basically the same behavior that prompted your bug report.
>> This suggests that Windows does *not* reset the secondary bus when
>> changing the bridge configuration.
>>
>> For v3.17, I reverted 1820ffdccb9b ("PCI: Make sure bus number
>> resources stay within their parents bounds"). For the future, I think
>> we should do a quirk to fix the _CRS, similar to what Andreas has
>> posted, and apply 1820ffdccb9b again.
>>
>> But I think the quirk should be specific to this system and BIOS. I
>> don't want to add a workaround that silently covers up Linux and BIOS
>> bugs. The reason amd_bus.c exists is because Linux was not smart
>> enough to pay attention to _CRS. Linux is now pretty good at that, so
>> the reason for amd_bus.c is mostly gone. I don't want to add new
>> dependencies on amd_bus.c that will prevent us from phasing it out.
> Why not always trust amd_bus over _CRS? Is there a scenario in which
> amd_bus is wrong?

amd_bus.c requires ongoing maintenance to keep it working for new
processors and topologies. The ACPI description of the platform is
the one the OEM intended, and it's the one that is tested. There are
cases where the ACPI description omits things that amd_bus.c would
find, e.g., when the BIOS reserves hardware that it doesn't want the
OS to touch.

> Are these methods (like _CRS) meant to set limits for us, or are they
> simply used to report the configuration decisions made by the BIOS? So
> if _CRS says that the window is [00-07] would it be ok for us to
> simply increase it (possibly after reprogramming the registers in
> amd_bus)?

_CRS tells us how a device is configured. _PRS tells us what other
settings are possible. _SRS chooses other settings. If BIOS supplies
_PRS and _SRS, we can change the settings. But I've never seen _PRS
and _SRS for host bridges, and Linux doesn't support them for host
bridges today.

It would not be OK for us to use amd_bus.c to reprogram registers.
The rest of ACPI assumes that the bridge is configured per _CRS, and
it assumes that any changes are done via _SRS. For example, there
could be AML that uses those assumptions.

I don't think a hybrid ACPI + native solution is really viable.
One-off bug workarounds are fine, but in the long run, I think we're
better off if we work from the same system description that Windows
does. Otherwise we'll continually trip over unexpected things.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/