Re: [PATCH v3] firmware: arm_scmi: Free mailbox channels if probe fails

From: rishabhb
Date: Mon Aug 30 2021 - 17:09:51 EST


Hi Christian
There seems to be another issue here. The response from agent can be delayed causing a timeout during base protocol acquire,
which leads to the probe failure. What I have observed is sometimes the failure of probe and rx_callback (due to a delayed message)
happens at the same time on different cpus.
Because of this race, the device memory may be cleared while the interrupt(rx_callback) is executing on another cpu.
How do you propose we solve this? Do you think it is better to take the setting up of base and other protocols out of probe and
in some delayed work? That would imply the device memory is not released until remove is called. Or should we add locking to
the interrupt handler(scmi_rx_callback) and the cleanup in probe to avoid the race?

On 2021-08-05 03:54, Cristian Marussi wrote:
On Wed, Aug 04, 2021 at 02:19:59PM -0700, Rishabh Bhatnagar wrote:
Mailbox channels for the base protocol are setup during probe.
There can be a scenario where probe fails to acquire the base
protocol due to a timeout leading to cleaning up of all device
managed memory including the scmi_mailbox structure setup during
mailbox_chan_setup function.
[ 12.735104]arm-scmi soc:qcom,scmi: timed out in resp(caller: version_get+0x84/0x140)
[ 12.735224]arm-scmi soc:qcom,scmi: unable to communicate with SCMI
[ 12.735947]arm-scmi: probe of soc:qcom,scmi failed with error -110

Now when a message arrives at cpu slightly after the timeout, the mailbox
controller will try to call the rx_callback of the client and might end
up accessing freed memory.
[ 12.758363][ C0] Call trace:
[ 12.758367][ C0] rx_callback+0x24/0x160
[ 12.758372][ C0] mbox_chan_received_data+0x44/0x94
[ 12.758386][ C0] __handle_irq_event_percpu+0xd4/0x240
This patch frees the mailbox channels setup during probe and adds some more
error handling in case the probe fails.

Signed-off-by: Rishabh Bhatnagar <rishabhb@xxxxxxxxxxxxxx>
---
drivers/firmware/arm_scmi/driver.c | 35 ++++++++++++++++++++++++-----------
1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
index 9b2e8d4..ead3bd3 100644
--- a/drivers/firmware/arm_scmi/driver.c
+++ b/drivers/firmware/arm_scmi/driver.c
@@ -1390,6 +1390,21 @@ void scmi_protocol_device_unrequest(const struct scmi_device_id *id_table)
mutex_unlock(&scmi_requested_devices_mtx);
}


Hi,

+static int scmi_cleanup_txrx_channels(struct scmi_info *info)
+{
+ int ret;
+ struct idr *idr = &info->tx_idr;
+
+ ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
+ idr_destroy(&info->tx_idr);
+
+ idr = &info->rx_idr;
+ ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
+ idr_destroy(&info->rx_idr);
+
+ return ret;
+}
+
static int scmi_probe(struct platform_device *pdev)
{
int ret;
@@ -1430,7 +1445,7 @@ static int scmi_probe(struct platform_device *pdev)

ret = scmi_xfer_info_init(info);
if (ret)
- return ret;
+ goto clear_txrx_setup;

if (scmi_notification_init(handle))
dev_err(dev, "SCMI Notifications NOT available.\n");
@@ -1443,7 +1458,7 @@ static int scmi_probe(struct platform_device *pdev)
ret = scmi_protocol_acquire(handle, SCMI_PROTOCOL_BASE);
if (ret) {
dev_err(dev, "unable to communicate with SCMI\n");
- return ret;
+ goto notification_exit;
}

mutex_lock(&scmi_list_mutex);
@@ -1482,6 +1497,12 @@ static int scmi_probe(struct platform_device *pdev)
}

return 0;
+
+notification_exit:
+ scmi_notification_exit(&info->handle);
+clear_txrx_setup:
+ scmi_cleanup_txrx_channels(info);
+ return ret;
}

void scmi_free_channel(struct scmi_chan_info *cinfo, struct idr *idr, int id)
@@ -1493,7 +1514,6 @@ static int scmi_remove(struct platform_device *pdev)
{
int ret = 0, id;
struct scmi_info *info = platform_get_drvdata(pdev);
- struct idr *idr = &info->tx_idr;
struct device_node *child;

mutex_lock(&scmi_list_mutex);
@@ -1517,14 +1537,7 @@ static int scmi_remove(struct platform_device *pdev)
idr_destroy(&info->active_protocols);

/* Safe to free channels since no more users */
- ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
- idr_destroy(&info->tx_idr);
-
- idr = &info->rx_idr;
- ret = idr_for_each(idr, info->desc->ops->chan_free, idr);
- idr_destroy(&info->rx_idr);
-
- return ret;
+ return scmi_cleanup_txrx_channels(info);
}


Looks good to me.

Reviewed-by: Cristian Marussi <cristian.marussi@xxxxxxx>
Tested-by: Cristian Marussi <cristian.marussi@xxxxxxx>
(Re-tested on for-next/scmi on top of virtio-scmi series)

Thanks,
Cristian