Re: [PATCH v2] tpm: Rework open/close/shutdown to avoid races

From: Jason Gunthorpe
Date: Tue Dec 15 2020 - 12:57:22 EST


On Tue, Dec 15, 2020 at 04:38:01PM +0300, Sergey Temerkhanov wrote:
> Avoid race condition at shutdown by shutting downn the TPM 2.0
> devices synchronously. This eliminates the condition when the
> shutdown sequence sets chip->ops to NULL leading to the following:
>
> [ 1586.593561][ T8669] tpm2_del_space+0x28/0x73
> [ 1586.598718][ T8669] tpmrm_release+0x27/0x33wq
> [ 1586.603774][ T8669] __fput+0x109/0x1d
> [ 1586.608380][ T8669] task_work_run+0x7c/0x90
> [ 1586.613414][ T8669] prepare_exit_to_usermode+0xb8/0x128
> [ 1586.619522][ T8669] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 1586.626068][ T8669] RIP: 0033:0x4cb4bb
>
> Signed-off-by: Sergey Temerkhanov <s.temerkhanov@xxxxxxxxx>
> drivers/char/tpm/tpm-chip.c | 2 ++
> drivers/char/tpm/tpm-dev.c | 20 +++++++++++++-------
> drivers/char/tpm/tpmrm-dev.c | 3 +++
> include/linux/tpm.h | 6 ++++--
> 4 files changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> index ddaeceb7e109..e94148b8e180 100644
> +++ b/drivers/char/tpm/tpm-chip.c
> @@ -295,6 +295,7 @@ static int tpm_class_shutdown(struct device *dev)
> {
> struct tpm_chip *chip = container_of(dev, struct tpm_chip, dev);
>
> + wait_event_idle(chip->waitq, !atomic_read(&chip->refcount));
> down_write(&chip->ops_sem);
> if (chip->flags & TPM_CHIP_FLAG_TPM2) {
> if (!tpm_chip_start(chip)) {
> @@ -330,6 +331,7 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
>
> mutex_init(&chip->tpm_mutex);
> init_rwsem(&chip->ops_sem);
> + init_waitqueue_head(&chip->waitq);
>
> chip->ops = ops;
>
> diff --git a/drivers/char/tpm/tpm-dev.c b/drivers/char/tpm/tpm-dev.c
> index e2c0baa69fef..8558f0f7382c 100644
> +++ b/drivers/char/tpm/tpm-dev.c
> @@ -19,27 +19,32 @@ static int tpm_open(struct inode *inode, struct file *file)
> {
> struct tpm_chip *chip;
> struct file_priv *priv;
> + int ret = 0;
>
> chip = container_of(inode->i_cdev, struct tpm_chip, cdev);
>
> /* It's assured that the chip will be opened just once,
> - * by the check of is_open variable, which is protected
> - * by driver_lock. */
> - if (test_and_set_bit(0, &chip->is_open)) {
> + * by the check of the chip reference count.
> + */
> + if (atomic_fetch_inc(&chip->refcount)) {

Use a refcount_t for all this

> @@ -39,6 +40,8 @@ static int tpmrm_release(struct inode *inode, struct file *file)
>
> tpm_common_release(file, fpriv);
> tpm2_del_space(fpriv->chip, &priv->space);
> + atomic_dec(&fpriv->chip->refcount);
> + wake_up_all(&fpriv->chip->waitq);

The usual pattern is

if (refcount_dec_and_test(&fpriv->chip->refcount))
wake_up_all(&fpriv->chip->waitq);

But this seems like madness, this blocks tpm_class_shutdown until
userspace closes a file descriptor, can't do it.

You need to have tpm_class_shutdown() remove the ops from the still
open FD and have that FD start returning -EIO when the ops are gone,
which is what the ops lock is already for.

Jason