Re: [PATCH] tee: optee: Add support for supplicant timeout

From: Sumit Garg
Date: Wed Dec 18 2024 - 02:30:49 EST


+ Erik

Hi Jens,

On Wed, 18 Dec 2024 at 12:27, Jens Wiklander <jens.wiklander@xxxxxxxxxx> wrote:
>
> Hi Sumit,
>
> On Fri, Dec 13, 2024 at 12:15 PM Sumit Garg <sumit.garg@xxxxxxxxxx> wrote:
> >
> > OP-TEE supplicant is a user-space daemon and it's possible for it
> > being crashed or killed in the middle of processing an OP-TEE RPC call.
> > It becomes more complicated when there is incorrect shutdown ordering
> > of the supplicant process vs the OP-TEE client application which can
> > eventually lead to system hang-up waiting for the closure of the client
> > application.
> >
> > In order to gracefully handle this scenario, let's add a long enough
> > timeout to wait for supplicant to process requests. In case there is a
> > timeout then we return a proper error code for the RPC request.
> >
> > Signed-off-by: Sumit Garg <sumit.garg@xxxxxxxxxx>
> > ---
> > drivers/tee/optee/supp.c | 58 +++++++++++++++++++++++++---------------
> > 1 file changed, 36 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
> > index 322a543b8c27..92e86ac4cdd4 100644
> > --- a/drivers/tee/optee/supp.c
> > +++ b/drivers/tee/optee/supp.c
> > @@ -7,6 +7,15 @@
> > #include <linux/uaccess.h>
> > #include "optee_private.h"
> >
> > +/*
> > + * OP-TEE supplicant timeout, the user-space supplicant may get
> > + * crashed or killed while servicing an RPC call. This will just lead
> > + * to OP-TEE client hung indefinitely just waiting for supplicant to
> > + * serve requests which isn't expected. It is rather expected to fail
> > + * gracefully with a timeout which is long enough.
> > + */
> > +#define SUPP_TIMEOUT (msecs_to_jiffies(10000))
> > +
> > struct optee_supp_req {
> > struct list_head link;
> >
> > @@ -52,8 +61,10 @@ void optee_supp_release(struct optee_supp *supp)
> >
> > /* Abort all queued requests */
> > list_for_each_entry_safe(req, req_tmp, &supp->reqs, link) {
> > - list_del(&req->link);
> > - req->in_queue = false;
> > + if (req->in_queue) {
> > + list_del(&req->link);
> > + req->in_queue = false;
> > + }
> > req->ret = TEEC_ERROR_COMMUNICATION;
> > complete(&req->c);
> > }
> > @@ -82,6 +93,7 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
> > struct optee_supp_req *req;
> > bool interruptable;
> > u32 ret;
> > + int res = 1;
> >
> > /*
> > * Return in case there is no supplicant available and
> > @@ -108,28 +120,28 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
> > /* Tell an eventual waiter there's a new request */
> > complete(&supp->reqs_c);
> >
> > - /*
> > - * Wait for supplicant to process and return result, once we've
> > - * returned from wait_for_completion(&req->c) successfully we have
> > - * exclusive access again.
> > - */
> > - while (wait_for_completion_interruptible(&req->c)) {
> > + /* Wait for supplicant to process and return result */
> > + while (res) {
> > + res = wait_for_completion_interruptible_timeout(&req->c,
> > + SUPP_TIMEOUT);
> > + /* Check if supplicant served the request */
> > + if (res > 0)
> > + break;
> > +
> > mutex_lock(&supp->mutex);
> > + /*
> > + * There's no supplicant available and since the supp->mutex
> > + * currently is held none can become available until the mutex
> > + * released again.
> > + *
> > + * Interrupting an RPC to supplicant is only allowed as a way
> > + * of slightly improving the user experience in case the
> > + * supplicant hasn't been started yet. During normal operation
> > + * the supplicant will serve all requests in a timely manner and
> > + * interrupting then wouldn't make sense.
> > + */
> > interruptable = !supp->ctx;
> > - if (interruptable) {
> > - /*
> > - * There's no supplicant available and since the
> > - * supp->mutex currently is held none can
> > - * become available until the mutex released
> > - * again.
> > - *
> > - * Interrupting an RPC to supplicant is only
> > - * allowed as a way of slightly improving the user
> > - * experience in case the supplicant hasn't been
> > - * started yet. During normal operation the supplicant
> > - * will serve all requests in a timely manner and
> > - * interrupting then wouldn't make sense.
> > - */
> > + if (interruptable || (res == 0)) {
>
> Are you fixing an observed problem or a theoretical one?

It is an observed problem, I was able to reproduce it using following
sequence with OP-TEE buildroot setup:

$ xtest 600 & // Run some secure storage tests using supplicant in
the background
$ kill -9 `pidof tee-supplicant` // Kill supplicant when the tests are
in progress.

This will cause the xtest to hang up.

> If the
> supplicant has died then "interruptable" is expected to be true so the
> timeout shouldn't matter.

When the supplicant dies, it doesn't lead to releasing the supplicant
context in the above test scenario. The reason is probably the
supplicant shared memory reference is held by OP-TEE which is in turn
is holding a reference to supplicant context.

-Sumit

>
> Cheers,
> Jens
>
> > if (req->in_queue) {
> > list_del(&req->link);
> > req->in_queue = false;
> > @@ -141,6 +153,8 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
> > req->ret = TEEC_ERROR_COMMUNICATION;
> > break;
> > }
> > + if (res == 0)
> > + req->ret = TEE_ERROR_TIMEOUT;
> > }
> >
> > ret = req->ret;
> > --
> > 2.43.0
> >