Re: [PATCH v2] platform/x86: add Gigabyte WMI temperature driver

From: Thomas Weißschuh
Date: Thu Apr 08 2021 - 10:55:00 EST


Hi Hans,

On Do, 2021-04-08T11:36+0200, Hans de Goede wrote:
> On 4/7/21 9:43 PM, Thomas Weißschuh wrote:
> > Hi Hans,
> >
> > On Mi, 2021-04-07T17:54+0200, Hans de Goede wrote:
> >> Thank you for your new driver and thank you for the quick respin
> >> addressing Barnabás' request to make it a WMI driver.
> >>
> >> The code looks good, so merging this should be a no-brainer,
> >> yet I'm not sure if I should merge this driver as-is, let me
> >> explain.
> >
> > thanks for the encouraging words.
> >
> >> The problem is that I assume that this is based on reverse-engineering?
> >
> > Yes, it is completely reverse-engineered.
> > Essentially I stumbled upon Matthews comment at
> > https://bugzilla.kernel.org/show_bug.cgi?id=204807#c37 and went from there.
> >
> >> We have some mixes experiences with reverse-engineered WMI drivers,
> >> sometimes a feature is not supported yet the wmi_evaluate_method()
> >> call happily succeeds. One example of this causing trouble is:
> >>
> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1797d588af15174d4a4e7159dac8c800538e4f8c
> >
> > There actually are reports of recent, similar mainboards with recent firmware and
> > similar sensor chips that do not support the temperature query.
> > (https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/3 and
> > https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/2)
> >
> > Unfortunately for unknown WMI queries the firmware does not return any value.
> > This ends up as an ACPI integer with value 0 on the driver side.
> > (Which I could not yet find documentation for if that is expected)
> > In the current version of the driver EIO is returned for 0 values which
> > get translated to N/A by lm-sensors.
> >
> >> At a minimum I think your driver should check in its
> >> probe function that
> >>
> >> gigabyte_wmi_query_integer(GIGABYTE_WMI_TEMPERATURE_QUERY, ...)
> >>
> >> actually succeeds on the machine the driver is running on chances
> >> are that Gigabyte has been using the DEADBEEF-2001-0000-00A0-C90629100000
> >> GUID for ages, where as the 0x125 value for GIGABYTE_WMI_TEMPERATURE_QUERY
> >> suggests that this is a pretty new API.
> >
> > Would it be enough to probe all six sensors and check if all return 0?
>
> I think that failing the probe with -ENODEV, with a dev_info() explaining why when
> all six sensors return 0 would be good yes, that should also fix those 2
> issues on https://github.com/t-8ch/linux-gigabyte-wmi-driver/.

I added such a validation step.

> >> It would be good if you can see if you can find some DSDT-s for older
> >> gigabyte motherboards attached to various distro's bug-tracking systems
> >> or forum posts and see how those respond to an unknown gigabyte_wmi_commandtype.
> >
> > Will do.
>
> Since you alreayd have bugreports of boards where this does not work,
> please don't spend too much time on this. I guess those older DSDT-s will
> also just return an integer with value 0.

Ok.

> >> Another open question to make sure this driver is suitable
> >> as a generic driver (and does not misbehave) is how to figure out
> >> how many temperature sensors there actually are.
> >
> > So far I could not find out how to query this from the firmware.
> > The IT8688 chip can report the state of each sensor but that is not exposed by
> > the firmware.
> > But even the state information from the IT8688 is not accurate as is.
> > One of the sensors that is reported as being active (directly via it87) on my
> > machine always reports -55°C (yes, negative).
>
> Ok.
>
> >> Perhaps the WMI interface returns an error when you query an out-of-range
> >> temperature channel?
> >
> > Also "0" as mentioned above.
>
> Hmm, so maybe this can be used to limit the amount of reported temperature
> sensors, IOW if sensors 5 and 6 report 0, only register 4 sensors ?

So far the 0-returning sensors have not been at the end of the list but in the
middle. Is it worth building logic to properly probe a bitmask of useful
sensors?

> >> One option here might be to add a DMI matching table and only load on
> >> systems on that table for now. That table could then perhaps also provide
> >> labels for each of the temperature channels, which is something which
> >> would be nice to have regardless of my worries about how well this driver
> >> will work on motherboards on which it has not been tested.
> >
> > I am collecting reports for working motherboards at
> > https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/1 .
>
> Good, you should probably ask reporters there to provide the output of:
>
> grep . /sys/class/dmi/id/* 2> /dev/null

I added a DMI-based whitelist and asked users to submit their DMI information.

> Ran as a normal user (so that the serial-numbers will be skipped) so that
> you will have DMI strings to match on if you decide to go that route.

The serials seem not to be too critical on these boards:

/sys/class/dmi/id/board_serial:Default string
/sys/class/dmi/id/chassis_serial:Default string
/sys/class/dmi/id/product_serial:Default string

> > As mentioned in the initial patch submission there would be different ways to
> > access this information firmware:
> >
> > * Directly call the underlying ACPI methods (these are present in all so far
> > observed firmwares, even if not exposed via WMI).
> > * Directly access the ACPI IndexField representing the it87 chip.
> > * Directly access the it87 registers while holding the relevant locks via ACPI.
> >
> > I assume all of those mechanisms have no place in a proper kernel driver but
> > would like to get your opinion on it.
>
> Actually the "Directly access the it87 registers" option is potentially interesting
> since it will allow using the it87 driver which gives a lot more features.
>
> I actually wrote a rough outline of how something like this could work here:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=204807#c47

I must have overread this one, but yes that's also what I had in mind.

> Note I'm not sure if that is the right approach, but it definitely is an
> option. It seems that this one might also solve the X470-AORUS-ULTRA-GAMING
> case (https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/2)
>
> Hopefully the direct-register ACPI/WMI access methods will also allow
> reading the super-io vendor and product ids so that we can be reasonably
> sure that we are not loading the wrong driver on a board.
>
> OTOH the WMI-temp method approach may also work on boards where the sensors
> (or some of the sensors) are not supported.
>
> I'm afraid there is no obviously correct answer here. If you feel like it
> experimenting with the "Directly access the it87 registers" option would certainly
> be interesting IMHO.
>
> It might be good to get hwmon subsystems maintainer's opinion on this
> before sinking a lot of time into this though (added to the Cc).

Sounds good.