RE: [PATCH V2] firmware: arm_scmi: bus: bypass set fwnode for scmi cpufreq

From: Peng Fan
Date: Mon Sep 02 2024 - 08:22:41 EST


Hi Saravana,

> Subject: RE: [PATCH V2] firmware: arm_scmi: bus: bypass set fwnode
> for scmi cpufreq
>
> > Subject: Re: [PATCH V2] firmware: arm_scmi: bus: bypass set fwnode
> for
> > scmi cpufreq
> >
> > On Wed, Aug 14, 2024 at 12:04 AM Peng Fan <peng.fan@xxxxxxx>
> > wrote:
> > >
> > > > Subject: Re: [PATCH V2] firmware: arm_scmi: bus: bypass set
> > fwnode
> > > > for scmi cpufreq
> > > >
> > > > On Tue, Aug 13, 2024 at 8:36 PM Peng Fan <peng.fan@xxxxxxx>
> > > > wrote:
> > > > >
> > > > > > Subject: RE: [PATCH V2] firmware: arm_scmi: bus: bypass set
> > > > fwnode
> > > > > > for scmi cpufreq
> > > > > >
> > > > > > > Subject: Re: [PATCH V2] firmware: arm_scmi: bus: bypass set
> > > > fwnode
> > > > > > for
> > > > > > > scmi cpufreq
> > > > > > >
> > > > > > > On Tue, Aug 13, 2024 at 01:52:30PM +0000, Peng Fan
> wrote:
> > > > > > > > > Subject: Re: [PATCH V2] firmware: arm_scmi: bus: bypass
> > > > > > > > > set
> > > > > > > fwnode
> > > > > > > > > for scmi cpufreq
> > > > > > > > >
> > > > > > > > > On Tue, Aug 13, 2024 at 10:25:31AM +0000, Peng Fan
> > wrote:
> > > > > > > > > > > Subject: Re: [PATCH V2] firmware: arm_scmi: bus:
> > > > > > > > > > > bypass set
> > > > > > > > > fwnode
> > > > > > > > > > > for scmi cpufreq
> > > > > > > > > > >
> > > > > > > > > > > On Jul 29, 2024 at 15:03:25 +0800, Peng Fan (OSS)
> > wrote:
> > > > > > > > > > > > From: Peng Fan <peng.fan@xxxxxxx>
> > > > > > > > > > > >
> > > > > > > > > > > > Two drivers scmi_cpufreq.c and
> scmi_perf_domain.c
> > > > both
> > > > > > use
> > > > > > > > > > > > SCMI_PROTCOL_PERF protocol, but with different
> > name,
> > > > so
> > > > > > > two
> > > > > > > > > scmi
> > > > > > > > > > > > devices will be created. But the fwnode->dev could
> > > > > > > > > > > > only point to one
> > > > > > > > > > > device.
> > > > > > > > > > > >
> > > > > > > > > > > > If scmi cpufreq device created earlier, the
> > > > > > > > > > > > fwnode->dev will point to the scmi cpufreq device.
> > > > > > > > > > > > Then the fw_devlink will link performance domain
> > > > > > > > > > > > user
> > > > > > > > > > > > device(consumer) to the scmi cpufreq
> > > > > > > > > device(supplier).
> > > > > > > > > > > > But actually the performance domain user device,
> > > > > > > > > > > > such
> > > > as
> > > > > > > > > > > > GPU,
> > > > > > > > > > > should
> > > > > > > > > > > > use the scmi perf device as supplier. Also if
> > > > 'cpufreq.off=1'
> > > > > > > > > > > > in bootargs, the GPU driver will defer probe
> > > > > > > > > > > > always, because of the scmi cpufreq
> > > > > > > > > > >
> > > > > > > > > > > The commit message itself seems very specific to
> > > > > > > > > > > some
> > > > > > platform
> > > > > > > > > > > to
> > > > > > > > > me.
> > > > > > > > > > > What about platforms that don't atall have a GPU?
> > Why
> > > > > > would
> > > > > > > > > they
> > > > > > > > > > > care about this?
> > > > > > > > > >
> > > > > > > > > > It is a generic issue if a platform has performance
> > > > > > > > > > domain to serve scmi cpufreq and device performance
> > level.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > device not ready.
> > > > > > > > > > > >
> > > > > > > > > > > > Because for cpufreq, no need use fw_devlink. So
> > > > > > > > > > > > bypass
> > > > > > > setting
> > > > > > > > > > > fwnode
> > > > > > > > > > > > for scmi cpufreq device.
> > > > > > > > > > > >
> > > > > > > > > > > > Fixes: 96da4a99ce50 ("firmware: arm_scmi: Set
> > fwnode
> > > > for
> > > > > > > the
> > > > > > > > > > > > scmi_device")
> > > > > > > > > > > > Signed-off-by: Peng Fan <peng.fan@xxxxxxx>
> > > > > > > > > > > > ---
> > > > > > > > > > > >
> > > > > > > > > > > > V2:
> > > > > > > > > > > > Use A!=B to replace !(A == B) Add fixes tag
> > > > > > > > > > > > This might be a workaround, but since this is a
> > > > > > > > > > > > fix, it is simple for backporting.
> > > > > > > > > > >
> > > > > > > > > > > More than a workaround, it feels like a HACK to me.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > V1:
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > drivers/firmware/arm_scmi/bus.c | 3 ++-
> > > > > > > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/firmware/arm_scmi/bus.c
> > > > > > > > > > > > b/drivers/firmware/arm_scmi/bus.c index
> > > > > > > > > > > 96b2e5f9a8ef..be91a82e0cda
> > > > > > > > > > > > 100644
> > > > > > > > > > > > --- a/drivers/firmware/arm_scmi/bus.c
> > > > > > > > > > > > +++ b/drivers/firmware/arm_scmi/bus.c
> > > > > > > > > > > > @@ -395,7 +395,8 @@
> __scmi_device_create(struct
> > > > > > > > > device_node
> > > > > > > > > > > *np, struct device *parent,
> > > > > > > > > > > > scmi_dev->id = id;
> > > > > > > > > > > > scmi_dev->protocol_id = protocol;
> > > > > > > > > > > > scmi_dev->dev.parent = parent;
> > > > > > > > > > > > - device_set_node(&scmi_dev->dev,
> > > > > > > of_fwnode_handle(np));
> > > > > > > > > > > > + if ((protocol != SCMI_PROTOCOL_PERF) ||
> > > > > > > strcmp(name,
> > > > > > > > > > > "cpufreq"))
> > > > > > > > > > > > + device_set_node(&scmi_dev->dev,
> > > > > > > > > > > of_fwnode_handle(np));
> > > > > > > > > > >
> > > > > > > > > > > I kind of disagree with the idea here to be specific
> > > > > > > > > > > about the PROTOCOL_PERF or cpufreq. This is a
> > > > > > > > > > > generic arm scmi bus
> > > > > > > driver
> > > > > > > > > right?
> > > > > > > > > > > Why bring in specific code into a bus driver? We
> > > > > > > > > > > will never fix the actual root cause of the issue this
> way.
> > > > > > > > > >
> > > > > > > > > > The root cause is fwnode devlink only supports one
> > > > > > > > > > fwnode, one
> > > > > > > > > device.
> > > > > > > > > > 1:1 match. But current arm scmi driver use one fwnode
> > > > > > > > > > for two
> > > > > > > > > devices.
> > > > > > > > > >
> > > > > > > > > > If your platform has scmi cpufreq and scmi device
> > > > > > > > > > performance
> > > > > > > > > domain,
> > > > > > > > > > you might see that some devices are consumer of scmi
> > > > > > > > > > cpufreq,
> > > > > > > but
> > > > > > > > > > actually they should be consumer of scmi device
> > > > performance
> > > > > > > > > domain.
> > > > > > > > > >
> > > > > > > > > > I not have a good idea that this is fw devlink design
> > > > > > > > > > that only allows
> > > > > > > > > > 1 fwnode has 1 device or not. If yes, that arm scmi
> > > > > > > > > > should be
> > > > > > fixed.
> > > > > > > > > > If not, fw devlink should be updated.
> > > > > > > > > >
> > > > > > > > > > The current patch is the simplest method for stable
> > > > > > > > > > tree fixes as I could work out.
> > > > > > > > >
> > > > > > > > > So this is the same root cause at the end of the issues
> > > > > > > > > you had with IMX pinctrl coexistence...i.e. the SCMI
> > > > > > > > > stack creates scmi devices that embeds the protocol
> > > > > > > > > node, BUT since you
> > > > can
> > > > > > > > > have multiple device/drivers doing different things on
> > > > > > > > > different resources within the same protocol you can end
> > > > > > > > > with
> > > > > > > > > 2 devices having the same embedded device_node, since
> > we
> > > > dont
> > > > > > > > > really
> > > > > > have
> > > > > > > > > anything else to
> > > > > > > use
> > > > > > > > > as device_node, I rigth ?
> > > > > > > >
> > > > > > > > I think, yes. And you remind me that with PINCTRL_SCMI
> > and
> > > > > > > > CONFIG_PINCTRL_IMX_SCMI both enabled, the scmi
> pinctrl
> > > > node
> > > > > > will
> > > > > > > only
> > > > > > > > take one to set the fwnode device pointer depends on the
> > > > > > > > order to
> > > > > > > run
> > > > > > > > __scmi_device_create.
> > > > > > > >
> > > > > > > > So not only perf, pinctrl also has issue here, fwnode
> > > > > > > > devlink will not work properly for pinctrl/perf.
> > > > > > >
> > > > > > > ...mmm ... the standard generic Pinctrl driver and the IMX
> > > > > > > Pintrcl are mutually exclusive in the code (@probe time)
> > > > > > > itself as far as I can remember about what you did, so why
> > > > > > > devlink should have
> > > > that
> > > > > > > issue there ?
> > > > > > > Have you seen any issue in this regards while having loaded
> > > > > > > pinctrl_scmi and pinctrl_imx_scmi ?
> > > > > >
> > > > > > No. it works well in my setup. I am just worried that there
> > > > > > might be issues because fwnode only has one dev pointer, see
> > device_add.
> > > > > >
> > > > > > >
> > > > > > > I want to have a better look next days about this devlink
> > > > > > > issue that you reported...it still not clear to me...from
> > > > > > > device_link_add() docs, it seems indeed that it will return
> > > > > > > the old existing link if a link exist already between that
> > > > > > > same supplier/consumer devices
> > > > > > pair....but
> > > > > > > from the code (at first sight) it seems that the check is
> > > > > > > made agains the devices not their embeded device_nodes,
> but
> > > > > > > here
> > > > (and
> > > > > > > in pinctrl/imx case) you will have 2 distinct consumer
> > > > > > > devices (with
> > > > > > same
> > > > > > > device_node)...I may have missed something in your
> > > > exaplanation....
> > > > > >
> > > > > > It might be false alarm for pinctrl, but it is true issue for perf.
> > > > >
> > > > > Just give a recheck. Pinctrl has same issue. With PINCTRL_SCMI
> > and
> > > > > PINCTRL_IMX_SCMI both enabled, two scmi devices will be
> > created.
> > > > So it
> > > > > depends on device creation order, 1st created device will be
> > > > > used as supplier.
> > > > >
> > > > > On i.MX, there is no issue with both enabled, it is because the
> > > > > imx scmi device is created first. If the generic pinctrl scmi
> > > > > device created first, imx will have issue.
> > > > >
> > > > > In the end, this is generic fw_devlink limitation or arm scmi
> > > > > driver issue.
> > > >
> > > > +1 to what Cristian and Dhruva said in other parts of this thread.
> > > >
> > > > This patch itself is definitely a hack.
> > > >
> > > > The problem isn't so much that fw_devlink doesn't want to
> support
> > > > multiple devices getting instantiated from one DT node. The
> > problem
> > > > is that there's no way to know which of the multiple devices is
> > > > the real supplier just by looking at the information in
> > > > devicetree/firmware (the fw in fw_devlink).
> > >
> > > Yes. I see.
> > >
> > > And keep in mind that one of the main requirements
> > > > of fw_devlink is to work before any driver is loaded and not
> > > > depend on drivers for correctness of the dependency information
> > > > because it needs to work on a fully modular kernel too. So,
> > > > fw_devlink just picks the first device that's instantiated from a DT
> node.
> > > >
> > > > I really hate folks creating multiple devices from one DT node.
> > > > One IP block can support multiple things, there's no need to
> > > > instantiate multiple devices for it. The same driver could have
> > > > just as easily registered with multiple frameworks. So, ideally
> > > > I'd want us to fix this issue in the SCMI framework code. In the
> > > > case where the same SCMI node is creating two devices, can they
> both probe successfully?
> > >
> > > In pinctrl case, only one could probe successfully.
> > > See
> > > drivers/pinctrl/pinctrl-scmi.c for generic pinctrl scmi driver
> > > drivers/pinctrl/freescale/pinctrl-imx-scmi.c for i.MX extension.
> > >
> > > There is a machine compatible string to make sure only one could
> > probe
> > > successfully.
> >
> > Why?! Isn't this the whole point of compatible strings and being able
> > to list multiple compatible strings for a DT node?
>
> The scmi devices are actually not populated from DT node, they are
> create from driver according to a table if the reg(protocol_id) exists in
> DT, see drivers/firmware/arm_scmi/bus.c "
> sdev = __scmi_device_create(np, parent,
> rdev->id_table->protocol_id,
> rdev->id_table->name); "
>
> For compatible strings, I could not comment on the design. Cristian
> seems replied in the other mail.
>
> >
> > >
> > > In scmi performance case, both could probe successfully.
> > > drivers/cpufreq/scmi-cpufreq.c
> > > drivers/pmdomain/arm/scmi_perf_domain.c
> > >
> > > If yes,
> > > > why are we not using a child node or a separate node for this
> > second
> > > > device? If it's always one or the other, why are we creating two
> > devices?
> > >
> > > Sudeep and Cristian may comment on the design goal of scmi
> drivers.
> > >
> > > > Can you please point to specific upstream DT examples for me to
> > get
> > > > a better handle on what's going on?
> > >
> > > See linux-next tree:
> > > arch/arm64/boot/dts/freescale/imx95.dtsi
> > >
> > > scmi_perf: protocol@13 {
> > > reg = <0x13>;
> > > #power-domain-cells = <1>;
> > > };
> > >
> > > scmi_iomuxc: protocol@19 {
> > > reg = <0x19>;
> > > };
> >
> > Sure, I can see that file and these nodes, but what am I supposed to
> > make of it? Can you be more specific please?
>
> Sorry for not being clear.
>
> I'm not familiar with the
> > protocol number mapping.
>
> The ID is in include/linux/scmi_protocol.h, line 918.
>
> Which nodes are the consumers in this
> > example?
>
> For " protocol@13", it is A55 cpu core with " power-domains =
> <&scmi_perf IMX95_PERF_A55>;" and other devices(GPU/VPU) that
> not in upstream tree as of now.
>
> Which node has more than one device created? Both of
> > these?
>
> Yes, both.
> drivers/firmware/arm_scmi/bus.c
> __scmi_device_create will create device according to "static const
> struct scmi_device_id scmi_id_table", so for protocol@19, It will create
> two devices, one is "{ SCMI_PROTOCOL_PINCTRL, "pinctrl" }", The other
> is " { SCMI_PROTOCOL_PINCTRL, "pinctrl-imx" }, ", both points to the
> node protocol@19, see "device_set_node(&scmi_dev->dev,
> of_fwnode_handle(np)); " in "__scmi_device_create".
>
> >
> > Can you also point me to specific examples for the pinctrl thing you
> > mention above?
>
> As above, two devices are created, both points to node protocol@19,
> but fwnode protocol@19 will only have 1st device as supplier.
>
> >
> > I'll take a closer look tomorrow.

Do you have a chance to look into the scmi stuff?

Thanks,
Peng.

>
> Thanks,
> Peng.
>
> >
> > -Saravana
> >
> > >
> > > >
> > > > Btw, there is the deferred_probe_timeout command line option
> > that
> > > > can be used so that fw_devlink stops enforcing dependencies
> where
> > > > there are no supplier drivers for a device after a timeout. It's not
> > > > ideal, but it's something to unblock you.
> > > >
> > > > The best fw_devlink could do is just not enforce any dependencies
> > if
> > > > there is more than one device instantiated for a given supplier DT
> > node.
> > >
> > > You mean not enforce fw_devlink for all or this could only apply to
> > > specific nodes or devices?
> > >
> > > Thanks,
> > > Peng.
> > >
> > > >
> > > > -Saravana