RE: [PATCH] ACPI: don't show an error when we're not in charge of PCIe hotplug.

From: Mario_Limonciello
Date: Tue Jun 21 2016 - 14:04:34 EST


> -----Original Message-----
> From: Peter Jones [mailto:pjones@xxxxxxxxxx]
> Sent: Tuesday, June 21, 2016 10:19 AM
> To: Rafael J. Wysocki <rafael@xxxxxxxxxx>
> Cc: ACPI Devel Maling List <linux-acpi@xxxxxxxxxxxxxxx>; Limonciello, Mario
> <Mario_Limonciello@xxxxxxxx>; Linux Kernel Mailing List <linux-
> kernel@xxxxxxxxxxxxxxx>; Len Brown <lenb@xxxxxxxxxx>; Rafael J . Wysocki
> <rjw@xxxxxxxxxxxxx>; Andy Lutomirski <luto@xxxxxxxxxxxxxx>
> Subject: Re: [PATCH] ACPI: don't show an error when we're not in charge of
> PCIe hotplug.
>
> (Sorry for the slow response - it's deadline time over here.)
>
> On Thu, Jun 16, 2016 at 04:56:57PM +0200, Rafael J. Wysocki wrote:
> > On Thu, Jun 16, 2016 at 2:12 AM, Rafael J. Wysocki <rafael@xxxxxxxxxx>
> wrote:
> > > On Thu, Jun 16, 2016 at 12:15 AM, Peter Jones <pjones@xxxxxxxxxx>
> wrote:
> > >> Right now when booting, on many laptops the firmware manages the
> PCIe
> > >> bus. As a result, when we call the _OSC ACPI method, it returns an
> > >> error code. Unfortunately the errors are not very articulate.
> > >
> > > What exactly do you mean here?
> > >
> > >> As a result, we show:
> > >>
> > >> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe])
> > >> acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM
> Segments MSI]
> > >> \_SB_.PCI0 (33DB4D5B-1FF7-401C-9657-7441C03DD766): _OSC invalid
> UUID
> > >> _OSC request data: 1 1f 0
> > >
> > > So _OSC told us that the UUID was invalid, didn't it?
> >
> > BTW, the above messages are KERN_DEBUG, so at least in theory they
> > shouldn't be visible in production runs.
> >
> > Maybe the bug to fix is that they show up when they aren't supposed to?
>
> No - the workflow that I am really trying to remedy is this:
>
> 1) S3 resume sometimes isn't working on some laptop you've got.
> 2) start looking at debug messages
> 3) this shows an error, so it looks like it's probably the problem
> 4) go fishing for red herring
> 5) if you happen to know who maintains the DSDT for the platform in
> question, eventually work out that this is working as intended and
> the bug is someplace else.
> 5b) if you don't know that person, eventually work out that it /might/
> be someplace else...
>
> So the idea was to make it look more like an indication of status, and
> less like an error that's causing unrelated problems.
>
> When I talked to Mario at Dell (Cc'd), it wasn't clear to us that
> there's a way to distinguish the between the UUID being
> invalid/malformed, being merely unsupported, or being supported in some
> configurations but not the current one. In this particular DSDT, the
> machine doesn't support the OS controlling any of this if USB-C /
> thunderbolt are enabled. The DSDT is clearly written with the belief
> that you have to completely disable the handling for that UUID in this
> case, and googling for this looks like it's not the only one written
> with that belief.
>
> Reading the spec (v6.1, sections 6.2.11.3 and 6.2.11.4), it seems
> plausible that you can express this instead by handling the UUID but
> choosing each individual query/status bit in the way that accomplishes
> the OS doing nothing with the response. So it may well be that that's
> just more code that vendors have thought wasn't necessary (or wasn't
> correct for some reason.)
>
> Mario, want to jump in on your thinking here?
>
> --
> Peter

After talking to the team, I was told this particular implementation to not let
OS take control when acting on that specific UUID based upon a variable
(NEXP in this case) came from Intel RC code.

That's probably why this is all across a lot of platforms, including non-Dell.

At least in the context of the laptop Peter noticed this on (Dell XPS 13 9350)
NEXP is set in GNVS based upon Thunderbolt capability.

As for why they return unrecognized UUID instead of just masking all the
capabilities bits? It's the same net functional result. If the vendor provided
RC code doesn't caused WCHK problems or functional problems it's hard to
make a case for why it needs to be changed by the OEM.

I think that Peter's patch is appropriate to message this is specifically
what's going on.