Re: [Freedreno] [PATCH 0/9] drm/msm: Avoid possible infinite probe deferral and speed booting

From: Jeffrey Hugo
Date: Mon Jul 13 2020 - 10:58:51 EST


On Mon, Jul 13, 2020 at 8:11 AM Rob Herring <robh+dt@xxxxxxxxxx> wrote:
>
> On Fri, Jul 10, 2020 at 5:02 PM Douglas Anderson <dianders@xxxxxxxxxxxx> wrote:
> >
> > I found that if I ever had a little mistake in my kernel config,
> > or device tree, or graphics driver that my system would sit in a loop
> > at bootup trying again and again and again. An example log was:
>
> Why do we care about optimizing the error case?
>
> > msm ae00000.mdss: bound ae01000.mdp (ops 0xffffffe596e951f8)
> > msm_dsi ae94000.dsi: ae94000.dsi supply gdsc not found, using dummy regulator
> > msm_dsi_manager_register: failed to register mipi dsi host for DSI 0
> > [drm:ti_sn_bridge_probe] *ERROR* could not find any panel node
> > ...
> >
> > I finally tracked it down where this was happening:
> > - msm_pdev_probe() is called.
> > - msm_pdev_probe() registers drivers. Registering drivers kicks
> > off processing of probe deferrals.
> > - component_master_add_with_match() could return -EPROBE_DEFER.
> > making msm_pdev_probe() return -EPROBE_DEFER.
> > - When msm_pdev_probe() returned the processing of probe deferrals
> > happens.
> > - Loop back to the start.
> >
> > It looks like we can fix this by marking "mdss" as a "simple-bus".
> > I have no idea if people consider this the right thing to do or a
> > hack. Hopefully it's the right thing to do. :-)
>
> It's a simple test. Do the child devices have any dependency on the
> parent to probe and/or function? If so, not a simple-bus.
>
> > Once I do this I notice that my boot gets marginally faster (you
> > don't need to probe the sub devices over and over) and also if I
>
> Can you quantify that?
>
> Have you run with devlinks enabled. You need a command line option to
> enable. That too should reduce deferred probes.
>
> > have a problem it doesn't loop forever (on my system it still
> > gets upset about some stuck clocks in that case, but at least I
> > can boot up).
>
> Deferred probe only runs when a device is added, so it's not like it
> is continually running.

But it is. I've hit this as well, but haven't attempted a fix.

So we have a parent device, with several sub devices. The parent
device probes which causes the sub devices to probe. One of the sub
devices successfully probes, and another fails with EPROBE_DEFER.
This both caused the probe defer framework to immediately schedule
processing the probe defer queue, and also cause all of the chile
devices and the parent device to be removed to probe defer later.
Since the system state doesn't change (one of the sub devices actually
requires an independent other device to have probed), the system ends
up an an infinite probe defer loop.