Re: [PATCH] driver core: Don't log intentional skip of device link creation as error

From: Jon Hunter
Date: Mon Oct 14 2024 - 08:50:36 EST


Hi Nicolas, Saravanna,

On 02/10/2024 21:57, Nícolas F. R. A. Prado wrote:
On Fri, Aug 09, 2024 at 12:13:25PM -0400, Nícolas F. R. A. Prado wrote:
On Mon, Jul 29, 2024 at 05:08:48PM -0700, Saravana Kannan wrote:
On Mon, Jul 29, 2024 at 2:25 PM Nícolas F. R. A. Prado
<nfraprado@xxxxxxxxxxxxx> wrote:

On Tue, Jun 25, 2024 at 09:56:07AM -0400, Nícolas F. R. A. Prado wrote:
On Mon, Jun 24, 2024 at 04:53:30PM -0700, Saravana Kannan wrote:
On Mon, Jun 24, 2024 at 8:21 AM Nícolas F. R. A. Prado
<nfraprado@xxxxxxxxxxxxx> wrote:

Commit ac66c5bbb437 ("driver core: Allow only unprobed consumers for
SYNC_STATE_ONLY device links") introduced an early return in
device_link_add() to prevent useless links from being created. However
the calling function fw_devlink_create_devlink() unconditionally prints
an error if device_link_add() didn't create a link, even in this case
where it is intentionally skipping the link creation.

Add a check to detect if the link wasn't created intentionally and in
that case don't log an error.

Your point is somewhat valid, and I might Ack this. But this really
shouldn't be happening a lot. Can you give more context on how you are
hitting this?

Of course. I'm seeing this on the mt8195-cherry-tomato-r2 platform.

The following error is printed during boot:

mediatek-drm-dp 1c500000.edp-tx: Failed to create device link (0x180) with backlight-lcd0

It doesn't happen with the upstream defconfig, but with the following config
change it does:

-CONFIG_PWM_MTK_DISP=m
+CONFIG_PWM_MTK_DISP=y

That probably changes the order in which the MTK DP and the backlight drivers
probe, resulting in the error.

One peculiarity that comes to mind is that the DP driver calls
devm_of_dp_aux_populate_bus() to run a callback once the panel has finished
probing. I'm not sure if this could have something to do with the error.

Full log at https://lava.collabora.dev/scheduler/job/14573149

Hi Saravana,

With the given context for where this issue is happening, what do you think
about this patch?

Ah sorry, missed your earlier email.

Couple of points:
1. It looks like the link in question is "SYNC_STATE_ONLY" because
it's part of a cycle. Correct me if I'm wrong. You might want to use
the new "post-init-providers" property to help fw_devlink break the
cycle and enforce the right dependency between the edp-tx and your
backlight. And then this error should go away and your device ordering
is enforced a bit better.

I don't see any cycle there. edp-tx points to backlight, but backlight doesn't
point back (from mt8195-cherry.dtsi):

&edp_tx {
...
aux-bus {
panel {
compatible = "edp-panel";
power-supply = <&pp3300_disp_x>;
backlight = <&backlight_lcd0>;

backlight_lcd0: backlight-lcd0 {
compatible = "pwm-backlight";
brightness-levels = <0 1023>;
default-brightness-level = <576>;
enable-gpios = <&pio 82 GPIO_ACTIVE_HIGH>;
num-interpolated-steps = <1023>;
pwms = <&disp_pwm0 0 500000>;
power-supply = <&ppvar_sys>;
};

And DL_FLAG_CYCLE is not set in the flags in the error log: 0x180. (Let me know
if there's something else that I should be looking at to detect a cycle)

Hi Saravana,

Here are some debug logs to help contextualize the issue:

[ 0.198518] device: 'backlight-lcd0': device_add
[ 0.198655] platform 1c500000.edp-tx: Linked as a sync state only consumer to backlight-lcd0
[ 34.971653] platform backlight-lcd0: error -EPROBE_DEFER: supplier 1100e000.pwm not ready
[ 35.115480] mediatek-drm-dp 1c500000.edp-tx: driver: 'mediatek-drm-dp': driver_bound: bound to device
[ 35.160115] mediatek-drm-dp 1c500000.edp-tx: Dropping the link to backlight-lcd0
[ 53.910433] pwm-backlight backlight-lcd0: driver: 'pwm-backlight': driver_bound: bound to device
[ 53.919213] mediatek-drm-dp 1c500000.edp-tx: Failed to create device link (0x180) with backlight-lcd0

So a SYNC_STATE_ONLY device link is created from backlight-lcd0 to edp-tx. When
the edp-tx probes, the link is dropped, since it is SYNC_STATE_ONLY. When the
backlight-lcd0 probes a new devlink is attempted to the consumer edp-tx and
fails, since it is useless, printing the warning.

You mentioned a cycle before. The only cycle I see is between the edp-tx and the
panel, but doesn't involve the backlight:

[ 0.198104] ----- cycle: start -----
[ 0.198105] /soc/edp-tx@1c500000/aux-bus/panel: cycle: depends on /soc/edp-tx@1c500000
[ 0.198112] ----- cycle: start -----
[ 0.198113] /soc/edp-tx@1c500000/aux-bus/panel: cycle: child of /soc/edp-tx@1c500000
[ 0.198119] /soc/edp-tx@1c500000: cycle: depends on /soc/edp-tx@1c500000/aux-bus/panel
[ 0.198125] ----- cycle: end -----
[ 0.198126] platform 1c500000.edp-tx: Fixed dependency cycle(s) with /soc/edp-tx@1c500000/aux-bus/panel

Just in case I tried using post-init-providers:

diff --git a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
index 75d56b2d5a3d..19df138ef043 100644
--- a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
@@ -322,6 +322,7 @@ &edp_tx {
pinctrl-names = "default";
pinctrl-0 = <&edptx_pins_default>;
+ post-init-providers = <&panel>;
ports {
#address-cells = <1>;
@@ -344,7 +345,7 @@ edp_out: endpoint {
};
aux-bus {
- panel {
+ panel: panel {
compatible = "edp-panel";
power-supply = <&pp3300_disp_x>;
backlight = <&backlight_lcd0>;

It broke the cycle, as I no longer see it in the logs, but the failed device
link warning is still there as expected.

It seems to me that the issue comes form the device link being SYNC_STATE_ONLY
in the first place. I see that comes from the 'else' path in

if (con->fwnode == link->consumer)
flags = fw_devlink_get_flags(link->flags);
else
flags = FW_DEVLINK_FLAGS_PERMISSIVE;

and the value on each side of the comparison is:

con->fwnode: /soc/edp-tx@1c500000
link->consumer: /soc/edp-tx@1c500000/aux-bus/panel

Could you help me understand what (if anything) is wrong here?

(I also noticed that as per the DT the consumer for backlight-lcd0 should be the
panel, but the devlink has it instead as the edp-tx, I'm guessing that's another
symptom of the same issue)


I did not seen any update on this. It would be great to get this fixed.

Thanks
Jon

--
nvpublic