Re: [PATCH] gpiolib: cdev: fix NULL-pointer dereferences
From: Bartosz Golaszewski
Date: Fri Nov 25 2022 - 11:49:53 EST
On Fri, Nov 25, 2022 at 5:24 PM Kent Gibson <warthog618@xxxxxxxxx> wrote:
>
> On Fri, Nov 25, 2022 at 04:32:57PM +0100, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski <bartosz.golaszewski@xxxxxxxxxx>
> >
> > There are several places where we can crash the kernel by requesting
> > lines, unbinding the GPIO device, then calling any of the system calls
> > relevant to the GPIO character device's annonymous file descriptors:
> > ioctl(), read(), poll().
> >
> > While I observed it with the GPIO simulator, it will also happen for any
> > of the GPIO devices that can be hot-unplugged - for instance any HID GPIO
> > expander (e.g. CP2112).
> >
> > This affects both v1 and v2 uAPI.
> >
> > Fix this by simply checking if the GPIO chip pointer is not NULL.
> >
>
> Fixes: ??
>
> And split for v1 and v2 as the Fixes for those will differ?
>
> > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@xxxxxxxxxx>
> > ---
> > drivers/gpio/gpiolib-cdev.c | 15 +++++++++++++++
> > 1 file changed, 15 insertions(+)
> >
> > diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c
> > index 0cb6b468f364..d5632742942a 100644
> > --- a/drivers/gpio/gpiolib-cdev.c
> > +++ b/drivers/gpio/gpiolib-cdev.c
> > @@ -201,6 +201,9 @@ static long linehandle_ioctl(struct file *file, unsigned int cmd,
> > unsigned int i;
> > int ret;
> >
> > + if (!lh->gdev->chip)
> > + return -ENODEV;
> > +
>
> Is there anything to prevent the chip being removed by another thread
> between this check and subsequent usage?
>
Eh... not really, no. The issue we have here seems to be the same as
the one Laurent Pinchart described back in 2015[1] and revisited
during his 2022 linux plumbers presentation[2], except he blamed it on
devres, whereas I think the problem is much deeper and devres has
nothing to do with it.
Ideally we'd need a global fortifying of the driver model against
hot-unplug related device unbinding.
After a quick glance at the relevant code, I think we'd need to add
some flag to struct file for the vfs to check on any fs operation and
return an error early if user-space tries to operate on an fd
associated with a removed device. I'm not sure yet if that's feasible
globally or even the right solution at all - just brainstorming here.
Then at the subsystem level, the GPIO device struct would need a lock
that would be taken by every user-space operation AND the code
unregistering the device so that we don't do what you described (i.e.
if there's a thread doing a read(), then let's wait until it returns
before we drop the device).
This wouldn't fix the case in which the same situation happened in a
kernel driver but crashing the kernel from within is a much lesser
offense than allowing user-space to crash it.
So this patch is just papering over for now I suppose.
Bart
[1] https://lkml.org/lkml/2015/7/14/741
[2] https://www.youtube.com/watch?v=kW8LHWlJPTU