Re: [PATCH v2 1/2] hwmon: add ChromeOS EC driver

From: Thomas Weißschuh
Date: Tue May 28 2024 - 12:16:09 EST


On 2024-05-28 08:50:49+0000, Guenter Roeck wrote:
> On 5/27/24 17:15, Stephen Horvath wrote:
> > On 28/5/24 05:24, Thomas Weißschuh wrote:
> > > On 2024-05-25 09:13:09+0000, Stephen Horvath wrote:
> > > > I was the one to implement fan monitoring/control into Dustin's driver, and
> > > > just had a quick comment for your driver:
> > > >
> > > > On 8/5/24 02:29, Thomas Weißschuh wrote:
> > > > > The ChromeOS Embedded Controller exposes fan speed and temperature
> > > > > readings.
> > > > > Expose this data through the hwmon subsystem.
> > > > >
> > > > > The driver is designed to be probed via the cros_ec mfd device.
> > > > >
> > > > > Signed-off-by: Thomas Weißschuh <linux@xxxxxxxxxxxxxx>
> > > > > ---
> > > > >    Documentation/hwmon/cros_ec_hwmon.rst |  26 ++++
> > > > >    Documentation/hwmon/index.rst         |   1 +
> > > > >    MAINTAINERS                           |   8 +
> > > > >    drivers/hwmon/Kconfig                 |  11 ++
> > > > >    drivers/hwmon/Makefile                |   1 +
> > > > >    drivers/hwmon/cros_ec_hwmon.c         | 269 ++++++++++++++++++++++++++++++++++
> > > > >    6 files changed, 316 insertions(+)
> > > > >
> > >
> > > <snip>
> > >
> > > > > diff --git a/drivers/hwmon/cros_ec_hwmon.c b/drivers/hwmon/cros_ec_hwmon.c
> > > > > new file mode 100644
> > > > > index 000000000000..d59d39df2ac4
> > > > > --- /dev/null
> > > > > +++ b/drivers/hwmon/cros_ec_hwmon.c
> > > > > @@ -0,0 +1,269 @@
> > > > > +// SPDX-License-Identifier: GPL-2.0-or-later
> > > > > +/*
> > > > > + *  ChromesOS EC driver for hwmon
> > > > > + *
> > > > > + *  Copyright (C) 2024 Thomas Weißschuh <linux@xxxxxxxxxxxxxx>
> > > > > + */
> > > > > +
> > > > > +#include <linux/device.h>
> > > > > +#include <linux/hwmon.h>
> > > > > +#include <linux/kernel.h>
> > > > > +#include <linux/mod_devicetable.h>
> > > > > +#include <linux/module.h>
> > > > > +#include <linux/platform_device.h>
> > > > > +#include <linux/platform_data/cros_ec_commands.h>
> > > > > +#include <linux/platform_data/cros_ec_proto.h>
> > > > > +#include <linux/units.h>
> > > > > +
> > > > > +#define DRV_NAME    "cros-ec-hwmon"
> > > > > +
> > > > > +struct cros_ec_hwmon_priv {
> > > > > +    struct cros_ec_device *cros_ec;
> > > > > +    u8 thermal_version;
> > > > > +    const char *temp_sensor_names[EC_TEMP_SENSOR_ENTRIES + EC_TEMP_SENSOR_B_ENTRIES];
> > > > > +};
> > > > > +
> > > > > +static int cros_ec_hwmon_read_fan_speed(struct cros_ec_device *cros_ec, u8 index, u16 *speed)
> > > > > +{
> > > > > +    u16 data;
> > > > > +    int ret;
> > > > > +
> > > > > +    ret = cros_ec->cmd_readmem(cros_ec, EC_MEMMAP_FAN + index * 2, 2, &data);
> > > > > +    if (ret < 0)
> > > > > +        return ret;
> > > > > +
> > > > > +    data = le16_to_cpu(data);
> > > > > +
> > > > > +    if (data == EC_FAN_SPEED_NOT_PRESENT)
> > > > > +        return -ENODEV;
> > > > > +
> > > >
> > > > Don't forget it can also return `EC_FAN_SPEED_STALLED`.
> > >
> > > Thanks for the hint. I'll need to think about how to handle this better.
> > >
> > > > Like Guenter, I also don't like returning `-ENODEV`, but I don't have a
> > > > problem with checking for `EC_FAN_SPEED_NOT_PRESENT` in case it was removed
> > > > since init or something.
> > >
>
> That won't happen. Chromebooks are not servers, where one might be able to
> replace a fan tray while the system is running.

In one of my testruns this actually happened.
When running on battery, one specific of the CPU sensors sporadically
returned EC_FAN_SPEED_NOT_PRESENT.

> > > Ok.
> > >
> > > > My approach was to return the speed as `0`, since the fan probably isn't
> > > > spinning, but set HWMON_F_FAULT for `EC_FAN_SPEED_NOT_PRESENT` and
> > > > HWMON_F_ALARM for `EC_FAN_SPEED_STALLED`.
> > > > No idea if this is correct though.
> > >
> > > I'm not a fan of returning a speed of 0 in case of errors.
> > > Rather -EIO which can't be mistaken.
> > > Maybe -EIO for both EC_FAN_SPEED_NOT_PRESENT (which should never happen)
> > > and also for EC_FAN_SPEED_STALLED.
> >
> > Yeah, that's pretty reasonable.
> >
>
> -EIO is an i/o error. I have trouble reconciling that with
> EC_FAN_SPEED_NOT_PRESENT or EC_FAN_SPEED_STALLED.
>
> Looking into the EC source code [1], I see:
>
> EC_FAN_SPEED_NOT_PRESENT means that the fan is not present.
> That should return -ENODEV in the above code, but only for
> the purpose of making the attribute invisible.
>
> EC_FAN_SPEED_STALLED means exactly that, i.e., that the fan
> is present but not turning. The EC code does not expect that
> to happen and generates a thermal event in case it does.
> Given that, it does make sense to set the fault flag.
> The actual fan speed value should then be reported as 0 or
> possibly -ENODATA. It should _not_ generate any other error
> because that would trip up the "sensors" command for no
> good reason.

Ack.

Currently I have the following logic (for both fans and temp):

if NOT_PRESENT during probing:
make the attribute invisible.

if any error during runtime (including NOT_PRESENT):
return -ENODATA and a FAULT

This should also handle the sporadic NOT_PRESENT failures.

What do you think?

Is there any other feedback to this revision or should I send the next?


Thanks,
Thomas