Re: [PATCH] tpm: don't destroy chip device prematurely

From: Jarkko Sakkinen
Date: Wed Oct 05 2016 - 06:02:42 EST


On Tue, Oct 04, 2016 at 10:47:38AM -0600, Jason Gunthorpe wrote:
> On Tue, Oct 04, 2016 at 08:19:46AM +0300, Jarkko Sakkinen wrote:
>
> > > Make the driver uncallable first. The worst race that can happen is that
> > > open("/dev/tpm0", ...) returns -EPIPE. I do not consider this fatal at
> > > all.
> >
> > No responses for this reasonable proposal so I'll show what I mean:
>
> How is this any better than what Thomas proposed? It seems much worse
> to me since now we have even more stuff in the wrong order.

It moves a logical block to the front instead of moving one thing
from one logical block to another place.

I'll repeat my question: what worse can happen than returning -EPIPE? I
though the whole rw lock scheme was introduced just for this purpose.

Why there's even that branch in tpm-dev.c if it's so bad to let it
happen?

/Jarkko

> There are three purposes to the ordering as it stands today
> 1) To guarantee that tpm2_shutdown is the last command delivered to
> the TPM. When it is issued all other ways to access the device
> are hard fenced off.
> 2) To hard fence the tpm subsystem for the 'platform' driver. Once
> tpm_del_char_device completes no callback into the driver
> is possible *at all*. The driver can destroy everything
> (iounmap, dereg irq, etc) and the driver module can be unloaded.
> 3) To prevent oopsing with the sysfs code. Recall this comment
>
> /* The sysfs routines rely on an implicit tpm_try_get_ops, device_del
> * is called before ops is null'd and the sysfs core synchronizes this
> * removal so that no callbacks are running or can run again
> */
>
> device_del is what eliminates the sysfs access path, so
> ordering device_del after ops = null is just unconditionally
> wrong.
>
> I still haven't heard an explanation why Thomas's other patches need
> this, or why trying to change this ordering makes any sense at
> all considering how the subsystem is constructed.
>
> Further, if tpm_crb now needs a registered device, how on earth do all
> the chip ops we call work *before* registration? Or is that another
> bug?
>
> Why can't tpm_crb return to the pre-registration operating state
> in the driver remove function before calling unregister?
>
> None of this makes any sense to me.
>
> This whole thing was very carefully constructed to work *correctly*
> during unregister. Many other subsystems have races and bugs during
> remove (eg see the securityfs discussion). TPM has a hard requirement
> to support safe unregister due to the vtpm stuff, so we don't get to
> screw it up just to support one driver.
>
> Jason