Re: [PATCH] remoteproc: Create a separate workqueue for recovery tasks

From: Alex Elder
Date: Thu Dec 17 2020 - 11:13:55 EST


On 12/15/20 4:55 PM, Bjorn Andersson wrote:
On Sat 12 Dec 14:48 CST 2020, Rishabh Bhatnagar wrote:

Create an unbound high priority workqueue for recovery tasks.

I have been looking at a different issue that is caused by
crash notification.

What happened was that the modem crashed while the AP was
in system suspend (or possibly even resuming) state. And
there is no guarantee that the system will have called a
driver's ->resume callback when the crash notification is
delivered.

In my case (in the IPA driver), handling a modem crash
cannot be done while the driver is suspended; i.e. the
activities in its ->resume callback must be completed
before we can recover from the crash.

For this reason I might like to change the way the
crash notification is handled, but what I'd rather see
is to have the work queue not run until user space
is unfrozen, which would guarantee that all drivers
that have registered for a crash notification will
be resumed when the notification arrives.

I'm not sure how that interacts with what you are
looking for here. I think the workqueue could still
be unbound, but its work would be delayed longer before
any notification (and recovery) started.

-Alex



This simply repeats $subject

Recovery time is an important parameter for a subsystem and there
might be situations where multiple subsystems crash around the same
time. Scheduling into an unbound workqueue increases parallelization
and avoids time impact.

You should be able to write this more succinctly. The important part is
that you want an unbound work queue to allow recovery to happen in
parallel - which naturally implies that you care about recovery latency.

Also creating a high priority workqueue
will utilize separate worker threads with higher nice values than
normal ones.


This doesn't describe why you need the higher priority.


I believe, and certainly with the in-line coredump, that we're running
our recovery work for way too long to be queued on the system_wq. As
such the content of the patch looks good!

Regards,
Bjorn

Signed-off-by: Rishabh Bhatnagar <rishabhb@xxxxxxxxxxxxxx>
---
drivers/remoteproc/remoteproc_core.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 46c2937..8fd8166 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -48,6 +48,8 @@ static DEFINE_MUTEX(rproc_list_mutex);
static LIST_HEAD(rproc_list);
static struct notifier_block rproc_panic_nb;
+static struct workqueue_struct *rproc_wq;
+
typedef int (*rproc_handle_resource_t)(struct rproc *rproc,
void *, int offset, int avail);
@@ -2475,7 +2477,7 @@ void rproc_report_crash(struct rproc *rproc, enum rproc_crash_type type)
rproc->name, rproc_crash_to_string(type));
/* create a new task to handle the error */
- schedule_work(&rproc->crash_handler);
+ queue_work(rproc_wq, &rproc->crash_handler);
}
EXPORT_SYMBOL(rproc_report_crash);
@@ -2520,6 +2522,10 @@ static void __exit rproc_exit_panic(void)
static int __init remoteproc_init(void)
{
+ rproc_wq = alloc_workqueue("rproc_wq", WQ_UNBOUND | WQ_HIGHPRI, 0);
+ if (!rproc_wq)
+ return -ENOMEM;
+
rproc_init_sysfs();
rproc_init_debugfs();
rproc_init_cdev();
@@ -2536,6 +2542,7 @@ static void __exit remoteproc_exit(void)
rproc_exit_panic();
rproc_exit_debugfs();
rproc_exit_sysfs();
+ destroy_workqueue(rproc_wq);
}
module_exit(remoteproc_exit);
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project