Re: [PATCH] platform/chrome: cros_ec_proto: Fix check_features ret val

From: Prashant Malani
Date: Wed Sep 22 2021 - 14:48:12 EST


Hi Enric,

On Wed, Sep 22, 2021 at 7:09 AM Enric Balletbo i Serra
<enric.balletbo@xxxxxxxxxxxxx> wrote:
>
> Hi Prashant,
>
> On 22/9/21 12:55, Prashant Malani wrote:
> > Hi Enric,
> >
> > On Wed, Sep 22, 2021 at 2:12 AM Enric Balletbo i Serra
> > <enric.balletbo@xxxxxxxxxxxxx> wrote:
> >>
> >> Hi Prashant,
> >>
> >> On 21/9/21 19:29, Prashant Malani wrote:
> >>> Hi Enric,
> >>>
> >>> Thanks for reviewing the patch.
> >>>
> >>> On Tue, Sep 21, 2021 at 01:42:04PM +0200, Enric Balletbo i Serra wrote:
> >>>> Hi Prashant,
> >>>>
> >>>> Thank you for the patch. Just one comment below ...
> >>>>
> >>>> On 16/9/21 3:46, Prashant Malani wrote:
> >>>>> The kerneldoc for cros_ec_check_features() states that it returns 1 or 0
> >>>>> depedending on whether a feature is supported or not, but it instead
> >>>>> returns a negative error number in one case, and a non-1 bitmask in
> >>>>> other cases.
> >>>>>
> >>>>> Since all call-sites only check for a 1 or 0 return value, update
> >>>>> the function to return boolean values.
> >>>>>
> >>>>> Signed-off-by: Prashant Malani <pmalani@xxxxxxxxxxxx>
> >>>>> ---
> >>>>> drivers/platform/chrome/cros_ec_proto.c | 12 +++++++-----
> >>>>> include/linux/platform_data/cros_ec_proto.h | 2 +-
> >>>>> 2 files changed, 8 insertions(+), 6 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/platform/chrome/cros_ec_proto.c b/drivers/platform/chrome/cros_ec_proto.c
> >>>>> index a7404d69b2d3..a34cf58c5ef7 100644
> >>>>> --- a/drivers/platform/chrome/cros_ec_proto.c
> >>>>> +++ b/drivers/platform/chrome/cros_ec_proto.c
> >>>>> @@ -808,9 +808,9 @@ EXPORT_SYMBOL(cros_ec_get_host_event);
> >>>>> *
> >>>>> * Call this function to test whether the ChromeOS EC supports a feature.
> >>>>> *
> >>>>> - * Return: 1 if supported, 0 if not
> >>>>> + * Return: true if supported, false if not (or if an error was encountered).
> >>>>> */
> >>>>> -int cros_ec_check_features(struct cros_ec_dev *ec, int feature)
> >>>>> +bool cros_ec_check_features(struct cros_ec_dev *ec, int feature)
> >>>>> {
> >>>>> struct cros_ec_command *msg;
> >>>>> int ret;
> >>>>> @@ -818,8 +818,10 @@ int cros_ec_check_features(struct cros_ec_dev *ec, int feature)
> >>>>> if (ec->features[0] == -1U && ec->features[1] == -1U) {
> >>>>> /* features bitmap not read yet */
> >>>>> msg = kzalloc(sizeof(*msg) + sizeof(ec->features), GFP_KERNEL);
> >>>>> - if (!msg)
> >>>>> - return -ENOMEM;
> >>>>> + if (!msg) {
> >>>>> + dev_err(ec->dev, "failed to allocate memory to get EC features\n");
> >>>>
> >>>> In case of failure you will be noticed by the allocator, prints after
> >>>> [k|v][m|z|c]alloc() functions are not needed, so I think you can just return
> >>>> false here.
> >>>>>
> >>>
> >>> Makes sense; I can make the change, but I had one question:
> >>>
> >>> If we solely return false, how will we tell from the logs that the
> >>> allocation error message was associated with this driver? Only returning
> >>> false means the driver probe (e.g cros-ec-typec) will continue (just assuming a certain feature
> >>> is not present). Wouldn't having this error message make this clear?
> >>>
> >>
> >> So I tried to find some doc about this without luck. But I think it has been an
> >> unwritten rule that GFP_KERNEL allocations for small allocations will never
> >> fail.
> >
> > That might be the case, but kzalloc() still returns the error value,
> > so even if it is very unlikely, we
> > still need to handle that error.
> >
>
> I'm not saying that we don't need to handle that error, we *must* handle the
> allocation errors, there is no doubts about this, but in this specific function
> we're ignoring all the errors and not noticing to the users. The discussion, or
> my nitpick (it's just a nit), is about if logging that the allocation failed in
> this driver is useful or not.
>
>
> >> If you system fails to allocate that small amount of memory you probably
> >> have bigger problems to solve and the above message is not really useful, even
> >> confusing, as the focus, likely, shouldn't be in this driver to solve the problem.
> >
> > I don't know if I necessarily concur with that rationale ("if it
> > fails, there are likely bigger issues").
> > There could be situations (hypothetical) where a series of allocations
> > might lead to a failure (or this might be a transient allocation
> > failure),
> > and it might be useful to know which driver is contributing the alloc
> > that finally precipitates the failure.
> >
>
> If the kernel is not able to give you a small amount of memory it is just a
> coincidence that this happened in your driver. Dynamic allocations and
> deallocations happens often, the print doesn't gives you more information as it
> could be any other driver because your system runs out of memory, and everybody
> can be affected.
>
> > Also, although it is very unlikely, returning true without an error
>
> I'm not saying this, I'm saying returning false but don't log the error to the
> console.
>
> > can mean the typec driver silently continues to function
>
> I think the consensus on the interface is, and was clear:
>
> 1/True if the feature is supported, 0/false if is not supported or there is an
> error. At some point we decided that the callers don't need to differentiate
> about if the feature is not supported or there is and unlikely error.
>

I wasn't present when this was agreed on, but I don't think that's the
right approach; moreover the current
implementation doesn't adhere to it.
In general I prefer either returning the appropriate error code, or
parsing the error and logging it.

> > wrongly assuming a feature flag to be set a certain way. That is
> > something we need to flag through the logs.
>
> If you run out of memory there will be a lot of logs, trust me ;-)
>
> > I certainly can't see the log message as confusing the reader of a log
> > any further.
> >
>
> As I said there is an unwritten rule (didn't find written doc about it tbh,
> although I know other maintainers follow) that prints after [k|v][m|z|c]alloc()
> functions are not needed.

Right, but in most cases they return -ENOMEM; here the boolean return
type prevents that.

>
> And as I said, this is a nit, I won't strongly oppose if you think that this
> message is really needed.

I'd prefer sticking with this if it's possible; in any case we'll have
a follow up patch soon which gets rid of
the kzalloc entirely (as you suggested) so this discussion will become moot.

Best regards,