On Tue 25 May 14:48 CDT 2021, Siddharth Gupta wrote:Right, I meant the dependence of either sysmon or SSR is on QMI,
On 5/24/2021 8:03 PM, Bjorn Andersson wrote:We need to care for the ordering if sysmon is to be able to use smd or
On Mon 17 May 18:08 CDT 2021, Siddharth Gupta wrote:My understanding of the topic was that each subdevice should be
Subdevices at the beginning of the subdev list should haveThe subdev lists layers of the communication onion, we bring them up
higher priority than those at the end of the list. Reverse
traversal of the list causes priority inversion, which can
impact the performance of the device.
inside out and we take them down outside in.
This stems from the primary idea that we want to be able to shut things
down cleanly (in the case of a stop) and we pass the "crashed" flag to
indicate to each recipient during "stop" that it may not rely on the
response of a lower layer.
As such, I don't think it's right to say that we have a priority
inversion.
independent of the other. In our case unfortunately the sysmon
subdevice depends on the glink endpoint.
glink to send the shutdown request.
Yes, exactly.
However the priority inversion doesn't happen in theseI see, that is indeed a problem.
subdevices, it happens due to the SSR notifications that we send
to kernel clients. In this case kernel clients also can have QMI
sockets that in turn depend on the glink endpoint, which means
when they go to release the QMI socket a broadcast will be sent
out to all connected clients about the closure of the connection
which in this case happens to be the remoteproc which died. So
if we peel the onion, we will be unnecessarily be waiting for a
dead remoteproc.
Yes, and this all stems from the design that everything communicatingHere the glink device on the rpmsg bus won't know about theFor example a device adds the glink, sysmon and ssr subdevsIn general the design is such that components are not expected to
to its list. During a crash the ssr notification would go
before the glink and sysmon notifications. This can cause a
degraded response when a client driver waits for a response
from the crashed rproc.
communicate with the crashed remote when "crashed" is set, this avoids
the single-remote crash.
crashed remoteproc till we send glink notification first, right?
Since we send out sysmon and SSR notifications first, the glink
device will still be "alive" on the rpmsg bus.
over glink is a child of glink, which isn't the case when you have a SSR
event that will end up blocking the sequence in qrtr.
For sysmon this is not a problem, because sysmon is implemented to not
attempt to communicate with the parent remoteproc upon a crash.
And all rpmsg devices will be torn down as a result of glink being tornThis was implemented downstream as a part of an early
down, so glink can fail early based on this (not sure if this was
implemented downstream though).
I don't think the listeners are doing anything wrong by closing
The problem that you describe where an SSR notification will directly orThe case where this isn't holding up is when two remote processorsYou are right, the window would become smaller in the case of two
crashes simultaneously, in which case e.g. sysmon has been seen hitting
its timeout waiting for an ack from a dead remoteproc - but I was under
the impression that this window shrunk dramatically as a side effect of
us fixing the notification ordering.
remoteprocs, but this issue can come up with even a single
remoteproc unless prioritize certain subdevices.
indirectly attempt to communicate over QRTR will certainly cause issues
in the single-rproc case as well.
But is there any reason why these listeners has to do the wrong thing at
stop(crashed=true)?
I say unprepare in any order might not make a difference because
Per above argument I don't think things depend on the unrolling on errorYes you are right, I only changed the others for consistence.Signed-off-by: Siddharth Gupta <sidgup@xxxxxxxxxxxxxx>I presume this is the case you actually care about, can you help me
---
drivers/remoteproc/remoteproc_core.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 626a6b90f..ac8fc42 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1167,7 +1167,7 @@ static int rproc_handle_resources(struct rproc *rproc,
static int rproc_prepare_subdevices(struct rproc *rproc)
{
- struct rproc_subdev *subdev;
+ struct rproc_subdev *subdev, *itr;
int ret;
list_for_each_entry(subdev, &rproc->subdevs, node) {
@@ -1181,9 +1181,11 @@ static int rproc_prepare_subdevices(struct rproc *rproc)
return 0;
unroll_preparation:
- list_for_each_entry_continue_reverse(subdev, &rproc->subdevs, node) {
- if (subdev->unprepare)
- subdev->unprepare(subdev);
+ list_for_each_entry(itr, &rproc->subdevs, node) {
+ if (itr == subdev)
+ break;
+ if (itr->unprepare)
+ itr->unprepare(subdev);
}
return ret;
@@ -1191,7 +1193,7 @@ static int rproc_prepare_subdevices(struct rproc *rproc)
static int rproc_start_subdevices(struct rproc *rproc)
{
- struct rproc_subdev *subdev;
+ struct rproc_subdev *subdev, *itr;
int ret;
list_for_each_entry(subdev, &rproc->subdevs, node) {
@@ -1205,9 +1207,11 @@ static int rproc_start_subdevices(struct rproc *rproc)
return 0;
unroll_registration:
- list_for_each_entry_continue_reverse(subdev, &rproc->subdevs, node) {
- if (subdev->stop)
- subdev->stop(subdev, true);
+ list_for_each_entry(itr, &rproc->subdevs, node) {
+ if (itr == subdev)
+ break;
+ if (itr->stop)
+ itr->stop(itr, true);
}
return ret;
@@ -1217,7 +1221,7 @@ static void rproc_stop_subdevices(struct rproc *rproc, bool crashed)
{
struct rproc_subdev *subdev;
- list_for_each_entry_reverse(subdev, &rproc->subdevs, node) {
+ list_for_each_entry(subdev, &rproc->subdevs, node) {
understand if you changed the others for consistence or if there's some
flow of events where that might be necessary.
However, I will give this more thought and see if unprepare in
the reverse order can make a difference.
happening in reverse order. But it's idiomatic.
Regards,
Bjorn
Thanks,
Sid
Regards,
Bjorn
if (subdev->stop)
subdev->stop(subdev, crashed);
}
@@ -1227,7 +1231,7 @@ static void rproc_unprepare_subdevices(struct rproc *rproc)
{
struct rproc_subdev *subdev;
- list_for_each_entry_reverse(subdev, &rproc->subdevs, node) {
+ list_for_each_entry(subdev, &rproc->subdevs, node) {
if (subdev->unprepare)
subdev->unprepare(subdev);
}
--
Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project