Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path

From: Viresh Kumar
Date: Thu Oct 22 2020 - 04:08:16 EST


On 20-10-20, 07:13, Rob Clark wrote:
> On Tue, Oct 20, 2020 at 4:24 AM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
> >
> > On 20-10-20, 12:56, Daniel Vetter wrote:
> > > Yeah that's bad practice. Generally you shouldn't need to hold locks
> > > in setup/teardown code, since there's no other thread which can
> > > possible hold a reference to anything your touching anymore. Ofc
> > > excluding quickly grabbing/dropping a lock to insert/remove objects
> > > into lists and stuff.
> > >
> > > The other reason is that especially with anything related to sysfs or
> > > debugfs, the locking dependencies you're pulling in are enormous: vfs
> > > locks pull in mm locks (due to mmap) and at that point there's pretty
> > > much nothing left you're allowed to hold while acquiring such a lock.
> > > For simple drivers this is no issue, but for fancy drivers (like gpu
> > > drivers) which need to interact with core mm) this means your
> > > subsystem is a major pain to use.
> > >
> > > Usually the correct fix is to only hold your subsystem locks in
> > > setup/teardown when absolutely required, and fix any data
> > > inconsistency issues by reordering your setup/teardown code: When you
> > > register as the last step and unregister as the first step, there's no
> > > need for any additional locking. And hence no need to call debugfs
> > > functions while holding your subsystem locks.
> > >
> > > The catch phrase I use for this is "don't solve object lifetime issues
> > > with locking". Instead use refcounting and careful ordering in
> > > setup/teardown code.
> >
> > This is exactly what I have done in the OPP core, the locks were taken
> > only when really necessary, though as we have seen now I have missed
> > that at a single place and that should be fixed as well. Will do that,
> > thanks.
>
> I do have an easy enough way to repro the issue, so if you have a
> patch I can certainly test it.

Does this fix it for you ? There is one more place still left where we
are taking the opp_table_lock while adding stuff to debugfs and that's
not that straight forward to fix. But I didn't see that path in your
circular dependency trace, so who knows :)

diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index 2483e765318a..4cc0fb716381 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -1181,6 +1181,10 @@ static void _opp_table_kref_release(struct kref *kref)
struct opp_device *opp_dev, *temp;
int i;

+ /* Drop the lock as soon as we can */
+ list_del(&opp_table->node);
+ mutex_unlock(&opp_table_lock);
+
_of_clear_opp_table(opp_table);

/* Release clk */
@@ -1208,10 +1212,7 @@ static void _opp_table_kref_release(struct kref *kref)

mutex_destroy(&opp_table->genpd_virt_dev_lock);
mutex_destroy(&opp_table->lock);
- list_del(&opp_table->node);
kfree(opp_table);
-
- mutex_unlock(&opp_table_lock);
}

void dev_pm_opp_put_opp_table(struct opp_table *opp_table)

--
viresh