Re: [PATCH v2] soc: qcom: pdr: Fix the potential deadlock

From: Bjorn Andersson
Date: Thu Feb 06 2025 - 17:13:35 EST


On Wed, Jan 29, 2025 at 09:25:44PM +0530, Mukesh Ojha wrote:
> When some client process A call pdr_add_lookup() to add the look up for
> the service and does schedule locator work, later a process B got a new
> server packet indicating locator is up and call pdr_locator_new_server()
> which eventually sets pdr->locator_init_complete to true which process A
> sees and takes list lock and queries domain list but it will timeout due
> to deadlock as the response will queued to the same qmi->wq and it is
> ordered workqueue and process B is not able to complete new server
> request work due to deadlock on list lock.
>
> Process A Process B
>
> process_scheduled_works()
> pdr_add_lookup() qmi_data_ready_work()
> process_scheduled_works() pdr_locator_new_server()
> pdr->locator_init_complete=true;
> pdr_locator_work()
> mutex_lock(&pdr->list_lock);
>
> pdr_locate_service() mutex_lock(&pdr->list_lock);
>
> pdr_get_domain_list()
> pr_err("PDR: %s get domain list
> txn wait failed: %d\n",
> req->service_name,
> ret);
>
> Fix it by removing the unnecessary list iteration as the list iteration
> is already being done inside locator work, so avoid it here and just
> call schedule_work() here.
>

I came to the same patch while looking into the issue related to
in-kernel pd-mapper reported here:
https://lore.kernel.org/lkml/Zqet8iInnDhnxkT9@xxxxxxxxxxxxxxxxxxxx/

So:
Reviewed-by: Bjorn Andersson <bjorn.andersson@xxxxxxxxxxxxxxxx>
Tested-by: Bjorn Andersson <bjorn.andersson@xxxxxxxxxxxxxxxx>

> Fixes: fbe639b44a82 ("soc: qcom: Introduce Protection Domain Restart helpers")
> CC: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Saranya R <quic_sarar@xxxxxxxxxxx>

Can we please use full names?

> Signed-off-by: Mukesh Ojha <mukesh.ojha@xxxxxxxxxxxxxxxx>

Unfortunately I can't merge this; Saranya's S-o-b comes first which
implies that she authored the patch, but you're listed as author.

Regards,
Bjorn

> ---
> Changes in v2:
> - Added Fixes tag,
>
> drivers/soc/qcom/pdr_interface.c | 8 +-------
> 1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/drivers/soc/qcom/pdr_interface.c b/drivers/soc/qcom/pdr_interface.c
> index 328b6153b2be..71be378d2e43 100644
> --- a/drivers/soc/qcom/pdr_interface.c
> +++ b/drivers/soc/qcom/pdr_interface.c
> @@ -75,7 +75,6 @@ static int pdr_locator_new_server(struct qmi_handle *qmi,
> {
> struct pdr_handle *pdr = container_of(qmi, struct pdr_handle,
> locator_hdl);
> - struct pdr_service *pds;
>
> mutex_lock(&pdr->lock);
> /* Create a local client port for QMI communication */
> @@ -87,12 +86,7 @@ static int pdr_locator_new_server(struct qmi_handle *qmi,
> mutex_unlock(&pdr->lock);
>
> /* Service pending lookup requests */
> - mutex_lock(&pdr->list_lock);
> - list_for_each_entry(pds, &pdr->lookups, node) {
> - if (pds->need_locator_lookup)
> - schedule_work(&pdr->locator_work);
> - }
> - mutex_unlock(&pdr->list_lock);
> + schedule_work(&pdr->locator_work);
>
> return 0;
> }
> --
> 2.34.1
>