Re: [PATCH] ACPI: don't show an error when we're not in charge of PCIe hotplug.

From: Rafael J. Wysocki
Date: Tue Jun 21 2016 - 18:55:39 EST


On Tue, Jun 21, 2016 at 8:07 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> On Tue, Jun 21, 2016 at 11:01 AM, <Mario_Limonciello@xxxxxxxx> wrote:
>>> -----Original Message-----
>>> From: Peter Jones [mailto:pjones@xxxxxxxxxx]
>>> Sent: Tuesday, June 21, 2016 10:19 AM
>>> To: Rafael J. Wysocki <rafael@xxxxxxxxxx>
>>> Cc: ACPI Devel Maling List <linux-acpi@xxxxxxxxxxxxxxx>; Limonciello, Mario
>>> <Mario_Limonciello@xxxxxxxx>; Linux Kernel Mailing List <linux-
>>> kernel@xxxxxxxxxxxxxxx>; Len Brown <lenb@xxxxxxxxxx>; Rafael J . Wysocki
>>> <rjw@xxxxxxxxxxxxx>; Andy Lutomirski <luto@xxxxxxxxxxxxxx>
>>> Subject: Re: [PATCH] ACPI: don't show an error when we're not in charge of
>>> PCIe hotplug.
>>>
>>> (Sorry for the slow response - it's deadline time over here.)
>>>
>>> On Thu, Jun 16, 2016 at 04:56:57PM +0200, Rafael J. Wysocki wrote:
>>> > On Thu, Jun 16, 2016 at 2:12 AM, Rafael J. Wysocki <rafael@xxxxxxxxxx>
>>> wrote:
>>> > > On Thu, Jun 16, 2016 at 12:15 AM, Peter Jones <pjones@xxxxxxxxxx>
>>> wrote:
>>> > >> Right now when booting, on many laptops the firmware manages the
>>> PCIe
>>> > >> bus. As a result, when we call the _OSC ACPI method, it returns an
>>> > >> error code. Unfortunately the errors are not very articulate.
>>> > >
>>> > > What exactly do you mean here?
>>> > >
>>> > >> As a result, we show:
>>> > >>
>>> > >> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe])
>>> > >> acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM
>>> Segments MSI]
>>> > >> \_SB_.PCI0 (33DB4D5B-1FF7-401C-9657-7441C03DD766): _OSC invalid
>>> UUID
>>> > >> _OSC request data: 1 1f 0
>>> > >
>>> > > So _OSC told us that the UUID was invalid, didn't it?
>>> >
>>> > BTW, the above messages are KERN_DEBUG, so at least in theory they
>>> > shouldn't be visible in production runs.
>>> >
>>> > Maybe the bug to fix is that they show up when they aren't supposed to?
>>>
>>> No - the workflow that I am really trying to remedy is this:
>>>
>>> 1) S3 resume sometimes isn't working on some laptop you've got.
>>> 2) start looking at debug messages
>>> 3) this shows an error, so it looks like it's probably the problem
>>> 4) go fishing for red herring
>>> 5) if you happen to know who maintains the DSDT for the platform in
>>> question, eventually work out that this is working as intended and
>>> the bug is someplace else.
>>> 5b) if you don't know that person, eventually work out that it /might/
>>> be someplace else...
>>>
>>> So the idea was to make it look more like an indication of status, and
>>> less like an error that's causing unrelated problems.
>>>
>>> When I talked to Mario at Dell (Cc'd), it wasn't clear to us that
>>> there's a way to distinguish the between the UUID being
>>> invalid/malformed, being merely unsupported, or being supported in some
>>> configurations but not the current one. In this particular DSDT, the
>>> machine doesn't support the OS controlling any of this if USB-C /
>>> thunderbolt are enabled. The DSDT is clearly written with the belief
>>> that you have to completely disable the handling for that UUID in this
>>> case, and googling for this looks like it's not the only one written
>>> with that belief.
>>>
>>> Reading the spec (v6.1, sections 6.2.11.3 and 6.2.11.4), it seems
>>> plausible that you can express this instead by handling the UUID but
>>> choosing each individual query/status bit in the way that accomplishes
>>> the OS doing nothing with the response. So it may well be that that's
>>> just more code that vendors have thought wasn't necessary (or wasn't
>>> correct for some reason.)
>>>
>>> Mario, want to jump in on your thinking here?
>>>
>>> --
>>> Peter
>>
>> After talking to the team, I was told this particular implementation to not let
>> OS take control when acting on that specific UUID based upon a variable
>> (NEXP in this case) came from Intel RC code.
>>
>> That's probably why this is all across a lot of platforms, including non-Dell.
>>
>> At least in the context of the laptop Peter noticed this on (Dell XPS 13 9350)
>> NEXP is set in GNVS based upon Thunderbolt capability.
>>
>> As for why they return unrecognized UUID instead of just masking all the
>> capabilities bits? It's the same net functional result. If the vendor provided
>> RC code doesn't caused WCHK problems or functional problems it's hard to
>> make a case for why it needs to be changed by the OEM.
>>
>> I think that Peter's patch is appropriate to message this is specifically
>> what's going on.
>>
>
> In general, it seems that Windows is considerably less inclined to
> throw out errors when the DSDT contains spec violations. It would be
> great if Microsoft were to tighten up their requirements :)

Agreed (but I speak for myself only as usual).

> I wonder if Intel could be persuaded to add an ACPI mechanism to grant
> PCIe native control per device or per bus instead of more-or-less
> globally as it is now.

I guess at least in some cases it just has to be done that way.

Thanks,
Rafael