Re: [PATCH v2] nvmem: core: Fix race in nvmem_register()

From: Russell King (Oracle)
Date: Tue Jan 03 2023 - 11:03:42 EST


On Wed, Jan 04, 2023 at 12:33:33AM +0900, Hector Martin wrote:
> On 04/01/2023 00.18, Russell King (Oracle) wrote:
> > On Tue, Jan 03, 2023 at 11:56:21PM +0900, Hector Martin wrote:
> >> On 03/01/2023 23.22, Srinivas Kandagatla wrote:
> >>>>>> drivers/nvmem/core.c | 32 +++++++++++++++++---------------
> >>>>>> 1 file changed, 17 insertions(+), 15 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c
> >>>>>> index 321d7d63e068..606f428d6292 100644
> >>>>>> --- a/drivers/nvmem/core.c
> >>>>>> +++ b/drivers/nvmem/core.c
> >>>>>> @@ -822,11 +822,8 @@ struct nvmem_device *nvmem_register(const struct nvmem_config *config)
> >>>>>> break;
> >>>>>> }
> >>>>>>
> >>>>>> - if (rval) {
> >>>>>> - ida_free(&nvmem_ida, nvmem->id);
> >>>>>> - kfree(nvmem);
> >>>>>> - return ERR_PTR(rval);
> >>>>>> - }
> >>>>>> + if (rval)
> >>>>>> + goto err_gpiod_put;
> >>>>>
> >>>>> Why was gpiod changes added to this patch, that should be a separate
> >>>>> patch/discussion, as this is not relevant to the issue that you are
> >>>>> reporting.
> >>>>
> >>>> Because freeing the device also does a gpiod_put in the destructor, so
> >>> This are clearly untested, And I dont want this to be in the middle to
> >>> fix to the issue you are hitting.
> >>
> >> I somehow doubt you tested any of these error paths either. Nobody tests
> >> initialization error paths. That's why there was a gpio leak here to
> >> begin with.
> >
> > Sadly, this is one of the biggest problems with error paths, they get
> > very little proper testing - and in most cases we're reliant on
> > reviewers spotting errors. That's why we much prefer the devm_* stuff,
> > but even that can be error-prone.
> >
> >>> We should always be careful about untested changes, in this case gpiod
> >>> has some conditions to check before doing a put. So the patch is
> >>> incorrect as it is.
> >>
> >> Then the existing code is also incorrect as it is, because the device
> >> release callback is doing the same gpiod_put() already. I just moved it
> >> out since we are now registering the device later.
> >
> > At the point where this change is being made (checking rval after
> > dev_set_name()) the struct device has not been initialised, so the
> > release callback will not be called. nvmem->wp_gpio will be leaked.
>
> But later in the code where device_put() was being called would will be,
> and that callback is calling gpiod_put() unconditionally, which is why I
> am doing the same after moving the device registration later.
>
> Is this wrong? Well,

I'm not going to read the rest of your rant, honestly it's really not
worth it. Let's just concentrate on trying to work out how best to fix
this crud.

Not only is there the issue with wp_gpio, but the whole IDA handling
is fscked as well, so there's many problems to be sorted out here,
and if we lump them all into one patch, we'll probably be getting to
the point of completely rewriting nvmem_register() making backports
extremely difficult.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!