Re: [PATCH v3 1/2] tpm: fix reference counting for struct tpm_chip

From: Jason Gunthorpe
Date: Fri Feb 05 2021 - 21:45:12 EST


On Fri, Feb 05, 2021 at 10:50:02PM +0100, Lino Sanfilippo wrote:
> On 05.02.21 at 16:58, Jason Gunthorpe wrote:
> eference in the first place).
> >
> > No, they are all chained together because they are all in the same
> > struct:
> >
> > struct tpm_chip {
> > struct device dev;
> > struct device devs;
> > struct cdev cdev;
> > struct cdev cdevs;
> >
> > dev holds the refcount on memory, when it goes 0 the whole thing is
> > kfreed.
> >
> > The rule is dev's refcount can't go to zero while any other refcount
> > is != 0.
> >
> > For instance devs holds a get on dev that is put back only when devs
> > goes to 0:
> >
> > static void tpm_devs_release(struct device *dev)
> > {
> > struct tpm_chip *chip = container_of(dev, struct tpm_chip, devs);
> >
> > /* release the master device reference */
> > put_device(&chip->dev);
> > }
> >
> > Both cdev elements do something similar inside the cdev layer.
>
> Well this chaining is exactly what does not work nowadays and what the patch is supposed
> to fix: currently we dont ever take the extra ref (not even in TPM 2 case, note that
> TPM_CHIP_FLAG_TMP2 is never set), so
>
> - if (chip->flags & TPM_CHIP_FLAG_TPM2)
> - get_device(&chip->dev);
> + get_device(&chip->dev);

Oh, hah, yes that is busted up. The patch sketch I sent to James is
the right way to handle it, feel free to take it up

> and tpm_devs_release() is never called, since there is nothing that ever puts devs, so

Yes, that is a pre-existing memory leak

> The race with only get_device()/putdevice() in tpm_common_open()/tpm_common_release() is:

The refcount handling is busted up and not working the way it is
designed, when that is fixed there is no race.

Jason