Re: [PATCH 0/1] platform/x86: Add Lenovo Legion WMI Drivers
From: Antheas Kapenekakis
Date: Fri Dec 27 2024 - 20:10:12 EST
On Sat, 28 Dec 2024 at 00:22, Derek John Clark
<derekjohn.clark@xxxxxxxxx> wrote:
>
> On Fri, Dec 27, 2024 at 10:48 AM Antheas Kapenekakis <lkml@xxxxxxxxxxx> wrote:
> Subject: Re: [PATCH 1/1] platform/x86: Add lenovo-legion-wmi drivers
>
> >Hi Derek,
>
> Seasons Greetings, Antheas.
>
> >Good job on the driver. I spent a bit of time reviewing it today, and I
> >have a few thoughts. Hopefully you can go through them fast, and we can
> >have a solid base of understanding moving forward so that I can e.g.,
> >merge your driver on Bazzite ahead of the kernel, so you can get some
> >valuable testing.
> >
> >// Firmware Attributes
> >Right now, I have one major concern I'd like to be addressed before
> >moving into the details of the driver. This is mirrored in the Asus
> >driver you referenced [1] as well and I do not think I have seen solid
> >argumentation for it yet.
> >
> >Essentially, you are using the firmware attributes interface for the
> >ephemeral attributes SPL, SPPT, and FPPT which is also a concern mirrored
> >in the thread with Armin (+cc) and John (+cc) [2]. The question here
> >becomes, if exposing fan curve attributes via the firmware interface is
> >fickle, why is it correct for SPL, SPPT, and FPPT?
>
> See [1] and [2] below. Hans de Goede and Ilpo Järvinen signed off on using it
> this way for the asus-armoury driver, as proposed by Mario. This class is the
> "most" correct one currently present in the kernel. If you think an alternative
> solution would make more sense then I would encourage you to create that
> interface and submit it. In the meantime I am going to move forward with what
> has already been discussed and, if you manage to get in in before this series
> is approved, then I will adjust. Otherwise, please CC me on the patch that
> changes this driver to include it.
They need to re-agree given the context below. It is one thing for
them to agree on it theoretically for settings that they might imagine
are persistent and another thing when in reality they are not.
You did not address this in your comment here.
The problem I am raising is that SPL, SPPT, and FPPT specifically are
not persistent enough to meet the guidelines of that interface.
[1] and [2] do not address this either.
I do not have an alternative planned, just noting that I'd like
everyone to be on the same page before we go ahead with this ABI.
> >For context, when it comes to Asus (see [1]), these tunables reset during
> >reboots, after sleep, and when changing the TDP mode. Specifically, TDP
> >mode, SPL, SPPT, and FPPT, and the fan curve are not preserved between
> >reboots, with only TDP mode being preserved between sleep. Following,
> >setting TDP mode resets SPL, SPP, and FPPT, and setting any one of those
> >resets the fan curve.
> >
> >For Lenovo and the Legion Go (1st Gen) it actually depends on the BIOS
> >version. In early BIOSes, the custom tunings used to be preserved until
> >you reset the BIOS. However, I suspect that this caused used confusion, as
> >if you uninstalled Legion Space, the tunings to custom mode would still be
> >applied which would be unintuitive. In new BIOS versions, the tunings reset
> >when changing TDP modes, which can be done with the combination Legion
> >L + Y. I am unsure if they are still preserved between reboots if you
> >remain in custom mode. Fan curve resets as well. But the point is that they
> >are not persisted, at least to the extent expected by the firmware
> >attributes class and writing to them in other modes is undefined.
> >
> >Lastly, if another driver were to be developed e.g., for AMD out-of-tree
> >to control devices without a vendor interface, we'd have the same
> >issue. As this driver or [1] merging would set a precedent for using
> >firmware attributes for these tunings, I think it is important to reach
> >agreement before moving forward. For AMD w/o vendor, more details can
> >be found in [3]. In [3], Mario makes an RFC for an alt interface for SPL,
> >SPPT, and FPPT, through amd_pmf which does not have this peculiar issue.
> >
> >There are settings where it does make sense however, such as VRAM, boot
> >sound (Asus), and the suspend light/barrel plug RGB (Legion). These
> >settings are typically persisted in BIOS and there is no ambiguity for
> >fitness in those.
> >
>
> >// Driver Details
> >Ok, as for the rest of the driver, I have (i) some stylistic comments and
> >(ii) will mirror similar concerns to John in unnecessary accesses.
> >
> >(i) Stylistic Comments
> >I would personally be a fan of adding WMI support to the kernel, such that
> >userspace can access WMI attributes without the use of a driver, as it is
> >done in Windows. However, as I have discussed with Mario, such a thing is
> >not an easy task, as it would require adding a BMOF parser to the kernel
> >which is a monumental effort. In any case, the current kernel requires
> >us custom write drivers for the WMI interface.
> >
> >The reason for this preface is that I think that your driver style is a bit
> >"too close" to what such a thing would look. E.g., in the driver I can see
> >snippets such as `LENOVO_CAPABILITY_DATA_01 WMI data block driver.` and
> >`MODULE_IMPORT_NS("CAPDATA_WMI");`. Since you are going through the effort
> >of writing a custom driver, I would be more opinionated in how I'd design
> >the driver, so that it's more intuitive from a user's perspective. I think
> >John's driver (which has its own issues) and the asus-wmi drivers strike a
> >bit better balance, where they "translate" the WMI calls into an ABI that
> >can be documented and then parsed by a developer.
>
> I'm not sure what you are talking about exactly. Can you provide some sort of
> example? The scope is quite specific as it is, and will be even more specific
> once I include the requested changes. There is no ambiguity for the user on
> what /sys/firmware/acpi/platform_profile or /sys/class/firmware-attributes/
> lenovo-wmi-gamezone/attributes/ppt_pl1_spl are. As Armin suggested and I ack'd,
> capdata01 and Other Method drivers will be linked as components in v2.
To rephrase, your ABI style is not intuitive, because it contains
implementation details such as "gamezone", "capdata01", and "Other
Method", in addition to the ABI being hardcoded to the WMI structure
lenovo uses. The documentation uses those keywords as well.
I just brought up that you could have taken different design choices
to have a cleaner API, and at the same time avoided comments such as
the IdeaPad one or using the Legion name.
If I understand correctly your last sentence, Armin suggested much of
the same (ie combine and merge).
> >Such a design process would then also allow to claim the name legion, as
> >you could make sure the legion driver only loads on Legion laptops, where
> >now it would randomly load in other laptops as well. Sidenote here is
> >that this is something I also found confusing, as the Legion Go does not
> >have a keyboard so that driver should definitely not load.
>
> This is a consequence of the WMI design in the kernel as is. WMI GUID tables
> only load on devices that implement them. If a device erroneously reports an
> unimplemented GUID that is a problem with the BIOS and it needs (preferred) a
> BIOS update and/or a DMI quirk to block it if loading it creates an issue. This
> will be device specific.
An industry practice cannot be a BIOS bug. If Lenovo has a practice of
using those interfaces in all of their BIOSes to allow for future
extension you'd need to take account of that.
Since there are Lenovo reps here they are more equipped to comment on
it than I am.
GUID tables loading != drivers loading also, I would not pin that on
the kernel. You could bail on a stub implementation or require
multiple interfaces be present.
> >E.g., you could only load when you detect the Gamezone interface and then
> >access the other two as well. The Gamezone interface is only used in
> >Legion laptops AFAIK. I think Lenovo also kind of implements all three
> >as stubs even if some are not used, to allow extending them with BIOS
> >updates, so blindly loading drivers based on the interface might not be
> >the best idea in any case. When the Legion Go released, I think only one
> >of the three interfaces was used, for example, but all three were present.
>
> If we run into a situation where that check passes on a device because the
> interface produces a valid ACPI object that says it is supported even though
> the interface isn't fully implemented then we can quirk it. Also, this is
> contradictory to your point (ii) where you insist these calls are unnecessary.
For the first part, see above. As for (ii) I only said they are
unnecessary during writing, not during probe or deriving _min, _max
which would be done once by userspace software.
> >(ii) Unnecessary Accesses
> >Even though you went hands off on your design of the driver, at the same
> >time you held back when it comes to userspace accesses by globally forcing
> >the Other function hints as limits. This creates two issues.
> >
> >The first one is that those limits might be wrong for certain devices or
> >certain users might want to go a bit above them, which would mean that if
> >you enforce them you'd need to provide a way to quirk them and a module
> >parameter override (I know they are correct for the Legion Go as I have
> >looked through them as well). But if it is not necessary in Windows,
> >why would we add additional roadblocks in the kernel? If Lenovo feels the
> >need to enforce them, they can do so in their firmware and extend that
> >protection to Windows as well. Asus does when it comes to their fan curve
> >firmware, for example.
>
> Unless you have some concrete examples of this actually occuring I'm inclined
> to assume that Lenovo knows how to implement their interface. If we do find a
> device that has a broken CAPDATA interface, again we can quirk it.
> Notwithstanding that, the data exists and it ensures that the interface isn't
> abused to the point that hardware is pushed beyond its design limitations.
> Mario or Xino can correct me here, but I expect AMD and Lenovo have an interest
> in ensuring the kernel doesn't permit settings that can lead to additional
> RMAs. For users who wish to disregard that there are solutions to modify this
> data which void the warranty, as you are well aware. I'll also point out that
> Lenovo controls the userspace software in Windows and never intended to support
> Linux originally, so it is reasonable to assume this was just an oversight in
> the design and they didn't expect anyone else would implement another WMI driver
> or software when it was designed. In any case, My primary concern is that the
> interface is safe and reliable.
WMI functions are accessible to all software in WIndows without any
limits and there are multiple WIndows software for Legion and Asus
devices that use those interfaces without limits.
So if it is a problem it is in the interest of Lenovo to fix this in
their BIOS, so that they remove that liability from both OSs. Here I
have to say that the max TDP of the device is limited by AMD CBS and
Lenovo has implemented overheating protection so it is actually quite
safe already.
So it is your choice to take on that overhead. If you do take it on,
make sure you dont do extra WMI calls and introduce a quirk system.
> >The second is that you are making a lot of necessary calls, which may harm
> >performance or potentially cause instability. My workflow for setting TDP
> >on the Go is that I first respect the TDP mode the user has set using
> >Legion L + Y. If the user wants to change that, then I do the following:
> >set TDP mode to custom, set the TDP, set the SPL, SPPT, and FPPT, and then
> >set the fan curve. I also add a small delay in-between each of these calls
> >as a further precaution. In your current driver, each of these calls would
> >make two additional calls to the WMI interface (one for the limits and
> >one for the TDP mode), which would more than double the number of calls
> >made in a typical scenario (from 5 to 12), where each triplet is made
> >back to back.
> >
> >There is also the issue mentioned by John, where you do cross-interface
> >interactions etc. To fix this, you'd have to retrieve the limits from
> >firmware on probing and then cache them in the driver after quirking. It
> >would be much easier just to skip all of this for now and just use them to
> >prepopulate _min and _max values as hints instead, which you already do.
>
> I had already planned on making this change for other reasons, but I want to
> dispel some misinformation you have presented here. First, I highly doubt
> anyone is making enough calls to this interface that the extra 4ns are going
> to even be noticed. I would say they are certainly using the interface
That's hardly an argument for making unnecessary calls unfortunately.
> incorrectly if they do. Second, LENOVO_CAPABILITY_DATA_01 is a separate
> interface, and it is (currently) only called once per attribute change. For
> some background, this was originally added when I was permitting the current
> fan profile to determine which capability data page was presented and set. In
> the submitted version I hard code the page to custom as testing under quiet,
> balanced, and performance showed no change.
I do not understand what "I hard code the page to custom" means.
If you mean the capability data does not change you are right, they
are hardcoded in the decompiled ACPI I am pretty sure (it has been
close to a year now so I might be forgetting).
> Technically it is an assumption
> that this will be universally true for all Legion hardware, but the change was
> made as the final approved version of Mario's patches which added the necessary
> custom profile to platform_profile explicitly states in the documentation that
> "custom" mode is not user selectable. As Armin pointed out, this only applies to
> the legacy interface in /sys/firmware. In any case, you need not worry about
> this as v2 will not need this call more than once during probe, and gamezone
> will not be called by Other Method ever.
When reviewing your code, I saw that it does two checks: one for
checking whether in custom mode and bailing, and one for checking caps
and then bailing. Which equals 3 calls.
It's good that you will be fixing that/I hope you will be fixing that.
It is not clear from your comment. Please try to skip the capability
call too.
If I missed anything lmk
Antheas
> >Those were my comments. You mentioned that you are travelling and might not
> >have access to your PC so take your time with the replies.
> >
> >Happy holidays,
> >Antheas
>
>
> [1] https://lore.kernel.org/all/e9f4fb37-5277-a7f0-2bec-8a6909b4e674@xxxxxxxxxxxxxxx/
> [2] https://lore.kernel.org/all/07fe1bf8-f470-4e40-86ba-0f8d2e61a3e6@xxxxxxx/
>
> Cheers,
> Derek