Re: [PATCH] driver core: Don't log intentional skip of device link creation as error

From: Nícolas F. R. A. Prado
Date: Tue Oct 15 2024 - 17:41:19 EST


On Mon, Oct 14, 2024 at 01:49:56PM +0100, Jon Hunter wrote:
> Hi Nicolas, Saravanna,
>
> On 02/10/2024 21:57, Nícolas F. R. A. Prado wrote:
> > On Fri, Aug 09, 2024 at 12:13:25PM -0400, Nícolas F. R. A. Prado wrote:
> > > On Mon, Jul 29, 2024 at 05:08:48PM -0700, Saravana Kannan wrote:
> > > > On Mon, Jul 29, 2024 at 2:25 PM Nícolas F. R. A. Prado
> > > > <nfraprado@xxxxxxxxxxxxx> wrote:
> > > > >
> > > > > On Tue, Jun 25, 2024 at 09:56:07AM -0400, Nícolas F. R. A. Prado wrote:
> > > > > > On Mon, Jun 24, 2024 at 04:53:30PM -0700, Saravana Kannan wrote:
> > > > > > > On Mon, Jun 24, 2024 at 8:21 AM Nícolas F. R. A. Prado
> > > > > > > <nfraprado@xxxxxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > Commit ac66c5bbb437 ("driver core: Allow only unprobed consumers for
> > > > > > > > SYNC_STATE_ONLY device links") introduced an early return in
> > > > > > > > device_link_add() to prevent useless links from being created. However
> > > > > > > > the calling function fw_devlink_create_devlink() unconditionally prints
> > > > > > > > an error if device_link_add() didn't create a link, even in this case
> > > > > > > > where it is intentionally skipping the link creation.
> > > > > > > >
> > > > > > > > Add a check to detect if the link wasn't created intentionally and in
> > > > > > > > that case don't log an error.
> > > > > > >
> > > > > > > Your point is somewhat valid, and I might Ack this. But this really
> > > > > > > shouldn't be happening a lot. Can you give more context on how you are
> > > > > > > hitting this?
> > > > > >
> > > > > > Of course. I'm seeing this on the mt8195-cherry-tomato-r2 platform.
> > > > > >
> > > > > > The following error is printed during boot:
> > > > > >
> > > > > > mediatek-drm-dp 1c500000.edp-tx: Failed to create device link (0x180) with backlight-lcd0
> > > > > >
> > > > > > It doesn't happen with the upstream defconfig, but with the following config
> > > > > > change it does:
> > > > > >
> > > > > > -CONFIG_PWM_MTK_DISP=m
> > > > > > +CONFIG_PWM_MTK_DISP=y
> > > > > >
> > > > > > That probably changes the order in which the MTK DP and the backlight drivers
> > > > > > probe, resulting in the error.
> > > > > >
> > > > > > One peculiarity that comes to mind is that the DP driver calls
> > > > > > devm_of_dp_aux_populate_bus() to run a callback once the panel has finished
> > > > > > probing. I'm not sure if this could have something to do with the error.
> > > > > >
> > > > > > Full log at https://lava.collabora.dev/scheduler/job/14573149
> > > > >
> > > > > Hi Saravana,
> > > > >
> > > > > With the given context for where this issue is happening, what do you think
> > > > > about this patch?
> > > >
> > > > Ah sorry, missed your earlier email.
> > > >
> > > > Couple of points:
> > > > 1. It looks like the link in question is "SYNC_STATE_ONLY" because
> > > > it's part of a cycle. Correct me if I'm wrong. You might want to use
> > > > the new "post-init-providers" property to help fw_devlink break the
> > > > cycle and enforce the right dependency between the edp-tx and your
> > > > backlight. And then this error should go away and your device ordering
> > > > is enforced a bit better.
> > >
> > > I don't see any cycle there. edp-tx points to backlight, but backlight doesn't
> > > point back (from mt8195-cherry.dtsi):
> > >
> > > &edp_tx {
> > > ...
> > > aux-bus {
> > > panel {
> > > compatible = "edp-panel";
> > > power-supply = <&pp3300_disp_x>;
> > > backlight = <&backlight_lcd0>;
> > >
> > > backlight_lcd0: backlight-lcd0 {
> > > compatible = "pwm-backlight";
> > > brightness-levels = <0 1023>;
> > > default-brightness-level = <576>;
> > > enable-gpios = <&pio 82 GPIO_ACTIVE_HIGH>;
> > > num-interpolated-steps = <1023>;
> > > pwms = <&disp_pwm0 0 500000>;
> > > power-supply = <&ppvar_sys>;
> > > };
> > >
> > > And DL_FLAG_CYCLE is not set in the flags in the error log: 0x180. (Let me know
> > > if there's something else that I should be looking at to detect a cycle)
> >
> > Hi Saravana,
> >
> > Here are some debug logs to help contextualize the issue:
> >
> > [ 0.198518] device: 'backlight-lcd0': device_add
> > [ 0.198655] platform 1c500000.edp-tx: Linked as a sync state only consumer to backlight-lcd0
> > [ 34.971653] platform backlight-lcd0: error -EPROBE_DEFER: supplier 1100e000.pwm not ready
> > [ 35.115480] mediatek-drm-dp 1c500000.edp-tx: driver: 'mediatek-drm-dp': driver_bound: bound to device
> > [ 35.160115] mediatek-drm-dp 1c500000.edp-tx: Dropping the link to backlight-lcd0
> > [ 53.910433] pwm-backlight backlight-lcd0: driver: 'pwm-backlight': driver_bound: bound to device
> > [ 53.919213] mediatek-drm-dp 1c500000.edp-tx: Failed to create device link (0x180) with backlight-lcd0
> >
> > So a SYNC_STATE_ONLY device link is created from backlight-lcd0 to edp-tx. When
> > the edp-tx probes, the link is dropped, since it is SYNC_STATE_ONLY. When the
> > backlight-lcd0 probes a new devlink is attempted to the consumer edp-tx and
> > fails, since it is useless, printing the warning.
> >
> > You mentioned a cycle before. The only cycle I see is between the edp-tx and the
> > panel, but doesn't involve the backlight:
> >
> > [ 0.198104] ----- cycle: start -----
> > [ 0.198105] /soc/edp-tx@1c500000/aux-bus/panel: cycle: depends on /soc/edp-tx@1c500000
> > [ 0.198112] ----- cycle: start -----
> > [ 0.198113] /soc/edp-tx@1c500000/aux-bus/panel: cycle: child of /soc/edp-tx@1c500000
> > [ 0.198119] /soc/edp-tx@1c500000: cycle: depends on /soc/edp-tx@1c500000/aux-bus/panel
> > [ 0.198125] ----- cycle: end -----
> > [ 0.198126] platform 1c500000.edp-tx: Fixed dependency cycle(s) with /soc/edp-tx@1c500000/aux-bus/panel
> >
> > Just in case I tried using post-init-providers:
> >
> > diff --git a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
> > index 75d56b2d5a3d..19df138ef043 100644
> > --- a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
> > +++ b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
> > @@ -322,6 +322,7 @@ &edp_tx {
> > pinctrl-names = "default";
> > pinctrl-0 = <&edptx_pins_default>;
> > + post-init-providers = <&panel>;
> > ports {
> > #address-cells = <1>;
> > @@ -344,7 +345,7 @@ edp_out: endpoint {
> > };
> > aux-bus {
> > - panel {
> > + panel: panel {
> > compatible = "edp-panel";
> > power-supply = <&pp3300_disp_x>;
> > backlight = <&backlight_lcd0>;
> >
> > It broke the cycle, as I no longer see it in the logs, but the failed device
> > link warning is still there as expected.
> >
> > It seems to me that the issue comes form the device link being SYNC_STATE_ONLY
> > in the first place. I see that comes from the 'else' path in
> >
> > if (con->fwnode == link->consumer)
> > flags = fw_devlink_get_flags(link->flags);
> > else
> > flags = FW_DEVLINK_FLAGS_PERMISSIVE;
> >
> > and the value on each side of the comparison is:
> >
> > con->fwnode: /soc/edp-tx@1c500000
> > link->consumer: /soc/edp-tx@1c500000/aux-bus/panel
> >
> > Could you help me understand what (if anything) is wrong here?
> >
> > (I also noticed that as per the DT the consumer for backlight-lcd0 should be the
> > panel, but the devlink has it instead as the edp-tx, I'm guessing that's another
> > symptom of the same issue)
>
>
> I did not seen any update on this. It would be great to get this fixed.

Since there hasn't been a reply on this, let's postpone this investigation and
move forward in fixing the error log. I've sent v2 of the patch:
https://lore.kernel.org/all/20240624-fwdevlink-probed-no-err-45d21feb05fd-v2@xxxxxxxxxxxxx

Thanks,
Nícolas