Re: [PATCH v1] misc: fastrpc: Trigger a panic using BUG_ON in device release

From: Greg KH
Date: Tue Jul 30 2024 - 03:16:59 EST


On Tue, Jul 30, 2024 at 12:39:45PM +0530, Abhishek Singh wrote:
> The user process on ARM closes the device node while closing the
> session, triggers a remote call to terminate the PD running on the
> DSP. If the DSP is in an unstable state and cannot process the remote
> request from the HLOS, glink fails to deliver the kill request to the
> DSP, resulting in a timeout error. Currently, this error is ignored,
> and the session is closed, causing all the SMMU mappings associated
> with that specific PD to be removed. However, since the PD is still
> operational on the DSP, any attempt to access these SMMU mappings
> results in an SMMU fault, leading to a panic. As the SMMU mappings
> have already been removed, there is no available information on the
> DSP to determine the root cause of its unresponsiveness to remote
> calls. As the DSP is unresponsive to all process remote calls, use
> BUG_ON to prevent the removal of SMMU mappings and to properly
> identify the root cause of the DSP’s unresponsiveness to the remote
> calls.
>
> Signed-off-by: Abhishek Singh <quic_abhishes@xxxxxxxxxxx>
> ---
> drivers/misc/fastrpc.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
> index 5204fda51da3..bac9c749564c 100644
> --- a/drivers/misc/fastrpc.c
> +++ b/drivers/misc/fastrpc.c
> @@ -97,6 +97,7 @@
> #define FASTRPC_RMID_INIT_CREATE_STATIC 8
> #define FASTRPC_RMID_INIT_MEM_MAP 10
> #define FASTRPC_RMID_INIT_MEM_UNMAP 11
> +#define PROCESS_KILL_SC 0x01010000
>
> /* Protection Domain(PD) ids */
> #define ROOT_PD (0)
> @@ -1128,6 +1129,9 @@ static int fastrpc_invoke_send(struct fastrpc_session_ctx *sctx,
> fastrpc_context_get(ctx);
>
> ret = rpmsg_send(cctx->rpdev->ept, (void *)msg, sizeof(*msg));
> + /* trigger panic if glink communication is broken and the message is for PD kill */
> + BUG_ON((ret == -ETIMEDOUT) && (handle == FASTRPC_INIT_HANDLE) &&
> + (ctx->sc == PROCESS_KILL_SC));

You just crashed the machine completely, sorry, but no, properly handle
the issue and clean up if you can detect it, do not break systems.

greg k-h