Re: [PATCH 2/3] hwmon: surface_temp: Add support for sensor names

From: Guenter Roeck
Date: Tue Apr 16 2024 - 17:08:34 EST


On Tue, Apr 16, 2024 at 09:00:05PM +0200, Maximilian Luz wrote:
> On 4/16/24 3:30 PM, Guenter Roeck wrote:
> > On Sat, Mar 30, 2024 at 12:24:01PM +0100, Maximilian Luz wrote:
>
> [...]
>
> > > +static int ssam_tmp_get_name(struct ssam_device *sdev, u8 iid, char *buf, size_t buf_len)
> > > +{
> > > + struct ssam_tmp_get_name_rsp name_rsp;
> > > + int status;
> > > +
> > > + status = __ssam_tmp_get_name(sdev->ctrl, sdev->uid.target, iid, &name_rsp);
> > > + if (status)
> > > + return status;
> > > +
> > > + /*
> > > + * This should not fail unless the name in the returned struct is not
> > > + * null-terminated or someone changed something in the struct
> > > + * definitions above, since our buffer and struct have the same
> > > + * capacity by design. So if this fails blow this up with a warning.
> > > + * Since the more likely cause is that the returned string isn't
> > > + * null-terminated, we might have received garbage (as opposed to just
> > > + * an incomplete string), so also fail the function.
> > > + */
> > > + status = strscpy(buf, name_rsp.name, buf_len);
> > > + WARN_ON(status < 0);
> >
> > Not acceptable. From include/asm-generic/bug.h:
> >
> > * Do not use these macros when checking for invalid external inputs
> > * (e.g. invalid system call arguments, or invalid data coming from
> > * network/devices), and on transient conditions like ENOMEM or EAGAIN.
> > * These macros should be used for recoverable kernel issues only.
> >
>
> Hmm, I always interpreted that as "do not use for checking user-defined
> input", which this is not.
>

"invalid data coming from network/devices" is not user-defined input.

> The reason I added/requested it here was to check for "bugs" in how we
> think the interface behaves (and our definitions related to it) as the
> interface was reverse-engineered. Generally, when this fails I expect
> that we made some mistake in our code (or the things we assume about the
> interface), which likely causes us to interpret the received data as
> "garbage" (and not the EC sending corrupted data, which it is generally
> not due to CRC checking and validation in the SAM driver). Hence, I
> personally would prefer if this blows up in a big warning with a trace
> attached to it, so that an end-user can easily report this to us and
> that we can appropriately deal with it. As opposed to some one-line
> error message that will likely get overlooked or not taken as seriously.
>

I have heard the "This backtrace is absolutely essential" argument before,
including the "will be fixed" part. Chromebooks report more than 500,000
warning backtraces _every single day_. None of them is getting fixed.

> If you still insist, I could change that to a dev_err() message. Or
> maybe make the comment a bit clearer.

dev_err() would be acceptable. WARN() or WARN_ON() are no-go.

Guenter