RE: [PATCH] tpm: Revert "tpm: fix invalid locking in NONBLOCKING mode"

From: Mario.Limonciello
Date: Tue May 26 2020 - 15:23:42 EST


> On Tue, 2020-05-26 at 13:32 -0500, Mario Limonciello wrote:
> > This reverts commit d23d12484307b40eea549b8a858f5fffad913897.
> >
> > This commit has caused regressions for the XPS 9560 containing
> > a Nuvoton TPM.
>
> Presumably this is using the tis driver?

Correct.

>
> > As mentioned by the reporter all TPM2 commands are failing with:
> > ERROR:tcti:src/tss2-tcti/tcti-device.c:290:tcti_device_receive()
> > Failed to read response from fd 3, got errno 1: Operation not
> > permitted
> >
> > The reporter bisected this issue back to this commit which was
> > backported to stable as commit 4d6ebc4.
>
> I think the problem is request_locality ... for some inexplicable
> reason a failure there returns -1, which is EPERM to user space.
>
> That seems to be a bug in the async code since everything else gives a
> ESPIPE error if tpm_try_get_ops fails ... at least no-one assumes it
> gives back a sensible return code.
>
> What I think is happening is that with the patch the TPM goes through a
> quick sequence of request, relinquish, request, relinquish and it's the
> third request which is failing (likely timing out). Without the patch,
> the patch there's only one request,relinquish cycle because the ops are
> held while the async work is executed. I have a vague recollection
> that there is a problem with too many locality request in quick
> succession, but I'll defer to Jason, who I think understands the
> intricacies of localities better than I do.

Thanks, I don't pretend to understand the nuances of this particular code,
but I was hoping that the request to revert got some attention since Alex's
kernel Bugzilla and message a few months ago to linux integrity weren't.

>
> If that's the problem, the solution looks simple enough: just move the
> ops get down because the priv state is already protected by the buffer
> mutex

Yeah, if that works for Alex's situation it certainly sounds like a better
solution than reverting this patch as this patch actually does fix a problem
reported by Jeffrin originally.

Could you propose a specific patch that Alex and Jeffrin can perhaps both try?

>
> James
>
> ---
>
> diff --git a/drivers/char/tpm/tpm-dev-common.c b/drivers/char/tpm/tpm-dev-
> common.c
> index 87f449340202..1784530b8387 100644
> --- a/drivers/char/tpm/tpm-dev-common.c
> +++ b/drivers/char/tpm/tpm-dev-common.c
> @@ -189,15 +189,6 @@ ssize_t tpm_common_write(struct file *file, const char
> __user *buf,
> goto out;
> }
>
> - /* atomic tpm command send and result receive. We only hold the ops
> - * lock during this period so that the tpm can be unregistered even if
> - * the char dev is held open.
> - */
> - if (tpm_try_get_ops(priv->chip)) {
> - ret = -EPIPE;
> - goto out;
> - }
> -
> priv->response_length = 0;
> priv->response_read = false;
> *off = 0;
> @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *file, const char
> __user *buf,
> if (file->f_flags & O_NONBLOCK) {
> priv->command_enqueued = true;
> queue_work(tpm_dev_wq, &priv->async_work);
> - tpm_put_ops(priv->chip);
> mutex_unlock(&priv->buffer_mutex);
> return size;
> }
>
> + /* atomic tpm command send and result receive. We only hold the ops
> + * lock during this period so that the tpm can be unregistered even if
> + * the char dev is held open.
> + */
> + if (tpm_try_get_ops(priv->chip)) {
> + ret = -EPIPE;
> + goto out;
> + }
> +
> ret = tpm_dev_transmit(priv->chip, priv->space, priv->data_buffer,
> sizeof(priv->data_buffer));
> tpm_put_ops(priv->chip);