Re: [PATCH v2 1/2] hwmon: add ChromeOS EC driver

From: Stephen Horvath
Date: Tue May 28 2024 - 20:58:47 EST


Hi Guenter,

On 29/5/24 09:29, Guenter Roeck wrote:
On 5/28/24 09:15, Thomas Weißschuh wrote:
On 2024-05-28 08:50:49+0000, Guenter Roeck wrote:
On 5/27/24 17:15, Stephen Horvath wrote:
On 28/5/24 05:24, Thomas Weißschuh wrote:
On 2024-05-25 09:13:09+0000, Stephen Horvath wrote:
I was the one to implement fan monitoring/control into Dustin's driver, and
just had a quick comment for your driver:

On 8/5/24 02:29, Thomas Weißschuh wrote:
The ChromeOS Embedded Controller exposes fan speed and temperature
readings.
Expose this data through the hwmon subsystem.

The driver is designed to be probed via the cros_ec mfd device.

Signed-off-by: Thomas Weißschuh <linux@xxxxxxxxxxxxxx>
---
    Documentation/hwmon/cros_ec_hwmon.rst |  26 ++++
    Documentation/hwmon/index.rst         |   1 +
    MAINTAINERS                           |   8 +
    drivers/hwmon/Kconfig                 |  11 ++
    drivers/hwmon/Makefile                |   1 +
    drivers/hwmon/cros_ec_hwmon.c         | 269 ++++++++++++++++++++++++++++++++++
    6 files changed, 316 insertions(+)


<snip>

diff --git a/drivers/hwmon/cros_ec_hwmon.c b/drivers/hwmon/cros_ec_hwmon.c
new file mode 100644
index 000000000000..d59d39df2ac4
--- /dev/null
+++ b/drivers/hwmon/cros_ec_hwmon.c
@@ -0,0 +1,269 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ *  ChromesOS EC driver for hwmon
+ *
+ *  Copyright (C) 2024 Thomas Weißschuh <linux@xxxxxxxxxxxxxx>
+ */
+
+#include <linux/device.h>
+#include <linux/hwmon.h>
+#include <linux/kernel.h>
+#include <linux/mod_devicetable.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/platform_data/cros_ec_commands.h>
+#include <linux/platform_data/cros_ec_proto.h>
+#include <linux/units.h>
+
+#define DRV_NAME    "cros-ec-hwmon"
+
+struct cros_ec_hwmon_priv {
+    struct cros_ec_device *cros_ec;
+    u8 thermal_version;
+    const char *temp_sensor_names[EC_TEMP_SENSOR_ENTRIES + EC_TEMP_SENSOR_B_ENTRIES];
+};
+
+static int cros_ec_hwmon_read_fan_speed(struct cros_ec_device *cros_ec, u8 index, u16 *speed)
+{
+    u16 data;
+    int ret;
+
+    ret = cros_ec->cmd_readmem(cros_ec, EC_MEMMAP_FAN + index * 2, 2, &data);
+    if (ret < 0)
+        return ret;
+
+    data = le16_to_cpu(data);
+
+    if (data == EC_FAN_SPEED_NOT_PRESENT)
+        return -ENODEV;
+

Don't forget it can also return `EC_FAN_SPEED_STALLED`.

Thanks for the hint. I'll need to think about how to handle this better.

Like Guenter, I also don't like returning `-ENODEV`, but I don't have a
problem with checking for `EC_FAN_SPEED_NOT_PRESENT` in case it was removed
since init or something.


That won't happen. Chromebooks are not servers, where one might be able to
replace a fan tray while the system is running.

In one of my testruns this actually happened.
When running on battery, one specific of the CPU sensors sporadically
returned EC_FAN_SPEED_NOT_PRESENT.


What Chromebook was that ? I can't see the code path in the EC source
that would get me there.


I believe Thomas and I both have the Framework 13 AMD, the source code is here: https://github.com/FrameworkComputer/EmbeddedController/tree/lotus-zephyr

The organisation confuses me a little, but Dustin has previous said on the framework forums (https://community.frame.work/t/what-ec-is-used/38574/2):

"This one is based on the Zephyr port of the ChromeOS EC, and tracks mainline more closely. It is in the branch lotus-zephyr.
All of the model-specific code lives in zephyr/program/lotus.
The 13"-specific code lives in a few subdirectories off the main tree named azalea."


Also I just unplugged my fan and you are definitely correct, the EC only generates EC_FAN_SPEED_NOT_PRESENT for fans it does not have the capability to support. Even after a reboot it just returns 0 RPM for an unplugged fan. I thought about simulating a stall too, but I was mildly scared I was going to break one of the tiny blades.

Ok.

My approach was to return the speed as `0`, since the fan probably isn't
spinning, but set HWMON_F_FAULT for `EC_FAN_SPEED_NOT_PRESENT` and
HWMON_F_ALARM for `EC_FAN_SPEED_STALLED`.
No idea if this is correct though.

I'm not a fan of returning a speed of 0 in case of errors.
Rather -EIO which can't be mistaken.
Maybe -EIO for both EC_FAN_SPEED_NOT_PRESENT (which should never happen)
and also for EC_FAN_SPEED_STALLED.

Yeah, that's pretty reasonable.


-EIO is an i/o error. I have trouble reconciling that with
EC_FAN_SPEED_NOT_PRESENT or EC_FAN_SPEED_STALLED.

Looking into the EC source code [1], I see:

EC_FAN_SPEED_NOT_PRESENT means that the fan is not present.
That should return -ENODEV in the above code, but only for
the purpose of making the attribute invisible.

EC_FAN_SPEED_STALLED means exactly that, i.e., that the fan
is present but not turning. The EC code does not expect that
to happen and generates a thermal event in case it does.
Given that, it does make sense to set the fault flag.
The actual fan speed value should then be reported as 0 or
possibly -ENODATA. It should _not_ generate any other error
because that would trip up the "sensors" command for no
good reason.

Ack.

Currently I have the following logic (for both fans and temp):

if NOT_PRESENT during probing:
   make the attribute invisible.

if any error during runtime (including NOT_PRESENT):
   return -ENODATA and a FAULT

This should also handle the sporadic NOT_PRESENT failures.

What do you think?

Is there any other feedback to this revision or should I send the next?


No, except I'd really like to know which Chromebook randomly generates
a EC_FAN_SPEED_NOT_PRESENT response because that really looks like a bug.
Also, can you reproduce the problem with the ectool command ?

I have a feeling it was related to the concurrency problems between ACPI and the CrOS code that are being fixed in another patch by Ben Walsh, I was also seeing some weird behaviour sometimes but I *believe* it was fixed by that.

Thanks,
Steve