Re: [PATCH 4/4] of: property: Avoid linking devices with circular dependencies

From: Saravana Kannan
Date: Thu Apr 16 2020 - 16:57:44 EST


On Thu, Apr 16, 2020 at 9:01 AM Nicolas Saenz Julienne
<nsaenzjulienne@xxxxxxx> wrote:
>
> On Wed, 2020-04-15 at 11:52 -0700, Saravana Kannan wrote:
> > On Wed, Apr 15, 2020 at 8:06 AM Nicolas Saenz Julienne
> > <nsaenzjulienne@xxxxxxx> wrote:
> > > When creating a consumer/supplier relationship between devices it's
> > > essential to make sure they aren't supplying each other creating a
> > > circular dependency.
> >
> > Kinda correct. But fw_devlink is not just about optimizing probing.
> > It's also about ensuring sync_state() callbacks work correctly when
> > drivers are built as modules. And for that to work, circular
> > "SYNC_STATE_ONLY" device links are allowed. I've explained it in a bit
> > more detail here [1].
>
> Understood.
>
> [...]
>
> > This only catches circular links made out of 2 devices. If we really
> > needed such a function that worked correctly to catch bigger
> > "circles", you'd need to recurse and it'll get super wasteful and
> > ugly.
>
> Yeah, I was kind of expecting this reply :).
>
> > Thankfully, device_link_add() already checks for circular dependencies
> > when we need it and it's much cheaper because the links are at a
> > device level and not examined at a property level.
> >
> > Is this a real problem you are hitting with the Raspberry Pi 4's? If
> > so can you give an example in its DT where you are hitting this?
>
> So the DT bit that triggered all this series is in
> 'arch/arm/boot/dts/bcm283x.dtsi'. Namely the interaction between
> 'cprman@7e101000' and 'dsi@7e209000.' Both are clock providers and both are
> clock consumers of each other.
>
> Well I had a second deeper look at the issue, here is how the circular
> dependency breaks the boot process (A being soc, B being cprman and C being
> dsi):
>
> Device node A
> Device node B -> C
> Device node C -> B
>
> The probe sequence is the following (with DL_FLAG_AUTOPROBE_CONSUMER):
> 1. A device is added, the rest of devices are siblings, nothing is done
> 2. B device is added, C device doesn't exist, B is added to
> 'wait_for_suppliers' list with 'need_for_probe' flag set.
> 3. C device is added, B is picked up from 'wait_for_suppliers' list, device
> link created with B consuming from C.
> 4. C is then parsed, and tried to be linked with B as a consumer this time.
> This fails after testing for circular deps (by device_is_dependent()) during
> device_link_add(). This leaves C in the 'wait_for_suppliers' list *for ever*
> as every further attempt at add_link() on C will fail.
>
> -> Ultimately this prevents C for ever being probed, which also prevents B from
> being probed. Which isn't good as B is the main clock provider of the system.
>
> Note that B can live without C. I think some clock re-parenting will not be
> accessible, but that's all.
>
> > I'll have to NACK this patch for reasons mentioned above and in [1].
> > However, I think I have a solution that should work for what I'm
> > guessing is your real problem. But let me see the description of the
> > real scenario before I claim to have a solution.
>
> My intuition would be, upon getting a circular dep from device_is_dependent()
> with DL_FLAG_AUTOPROBE_CONSUMER to switch need_for_probe to false on both
> devices.

The problem with that is the devices will start trying to probe and
then defer due to other suppliers that are needed for probing but
haven't been linked yet. So it'll go a bit against what you are trying
to do. Also it doesn't solve the problem of already created links that
are wrong.

I'll send out a patch in reply to your email. I've been meaning to
send that outside of this discussion. It doesn't cover all cases of
cycles, but it'll cover most cases and I think it should fix your case
too.

For a more comprehensive fix, I'd like to do something like what I
explain here [1]. That should be doable for your driver too if you
want to try that approach. But I haven't heard Rob/Frank's opinion on
that.

-Saravana
[1] - https://lore.kernel.org/lkml/CAGETcx_2vdjSWc3BBN-N2WrtJP90ZnH-2vE=2iVuHuaE1YmMWQ@xxxxxxxxxxxxxx/