Re: [PATCH v2] gpiolib: Bind gpio_device to a driver to enable fw_devlink=on by default

From: Greg Kroah-Hartman
Date: Wed Jan 20 2021 - 10:54:36 EST


On Wed, Jan 20, 2021 at 03:39:30PM +0000, Marc Zyngier wrote:
> On 2021-01-18 20:38, Saravana Kannan wrote:
> > On Sun, Jan 17, 2021 at 4:02 AM Marc Zyngier <maz@xxxxxxxxxx> wrote:
> > >
> > > Hi Saravana,
> > >
> > > Thanks for posting this, much appreciated.
> > >
> > > On Sat, 16 Jan 2021 01:14:11 +0000,
> > > Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
> > > >
> > > > There are multiple instances of GPIO devictree nodes of the form:
> > > >
> > > > foo {
> > > > compatible = "acme,foo";
> > > > ...
> > > >
> > > > gpio0: gpio0@xxxxxxxx {
> > > > compatible = "acme,bar";
> > > > ...
> > > > gpio-controller;
> > > > };
> > > >
> > > > gpio1: gpio1@xxxxxxxx {
> > > > compatible = "acme,bar";
> > > > ...
> > > > gpio-controller;
> > > > };
> > > >
> > > > ...
> > > > }
> > > >
> > > > bazz {
> > > > my-gpios = <&gpio0 ...>;
> > > > }
> > > >
> > > > Case 1: The driver for "foo" populates struct device for these gpio*
> > > > nodes and then probes them using a driver that binds with "acme,bar".
> > > > This lines up with how DT nodes with the "compatible" property are
> > > > generally converted to struct devices and then registered with driver
> > > > core to probe them. This also allows the gpio* devices to hook into all
> > > > the driver core capabilities like runtime PM, probe deferral,
> > > > suspend/resume ordering, device links, etc.
> > > >
> > > > Case 2: The driver for "foo" doesn't populate its child device nodes
> > > > with "compatible" property and instead just loops through its child
> > > > nodes and directly registers the GPIOs with gpiolib without ever
> > > > populating a struct device or binding a driver to it.
> > >
> > > That's not quite an accurate description. The gpiolib subsystem does
> > > create a struct device. It doesn't register a driver though, which is
> > > what causes the issue with fr_devlink (more on that below).
> >
> > The devices created by gpiolib care are created for case 1 and case 2.
> > They are just devices gpiolib uses to represent a virtual software
> > device to hook different attributes to and expose them in sysfs. I'm
> > not talking about those devices here. The devices I'm referring to are
> > devices that represent the actual HW IP -- so what I'm saying is
> > accurate.
>
> Please read what you wrote : "without ever populating a struct device".
> I stand by the "not quite accurate".
>
> >
> > > >
> > > > Drivers that follow the case 2 cause problems with fw_devlink=on. This
> > > > is because fw_devlink will prevent bazz from probing until there's a
> > > > struct device that has gpio0 as its fwnode (because bazz lists gpio0 as
> > > > a GPIO supplier). Once the struct device is available, fw_devlink will
> > > > create a device link between with gpio0 as the supplier and bazz as the
> > > > consumer. After this point, the device link will prevent bazz from
> > > > probing until its supplier (the gpio0 device) has bound to a driver.
> > > > Once the supplier is bound to a driver, the probe of bazz is triggered
> > > > automatically.
> > > >
> > > > Finding and refactoring all the instances of drivers that follow case 2
> > > > will cause a lot of code churn and it not something that can be done in
> > > > one shot. Examples of such instances are [1] [2].
> > > >
> > > > This patch works around this problem and avoids all the code churn by
> > > > simply creating a stub driver to bind to the gpio_device. Since the
> > > > gpio_device already points to the GPIO device tree node, this allows all
> > > > the consumers to continue probing when the driver follows case 2.
> > > >
> > > > If/when all the old drivers are refactored, we can revert this
> > > > patch.
> > >
> > > My personal gripe with this approach is that it is an abrupt change in
> > > the way DT and device model are mapped onto each other.
> > >
> > > As far as I know (and someone please correct me if I am wrong), there
> > > is zero expectation that a device/subdevice/random IP block described
> > > by a node with a "compatible" property will end-up being managed by a
> > > driver that is bound to that particular node.
> > >
> > > The node/subnode division is a good way to express some HW boundaries,
> > > but doesn't say anything about the way this should be handled in the
> > > kernel. Assuming that everything containing a "compatible" string will
> > > eventually be bound to a driver is IMO pretty limiting.
> >
> > The default behavior of of_platform_populate() is to create a struct
> > device for every node with "compatible" property. That's how top level
> > devices (or children of simple-bus devices) are populated. IIRC,
> > that's what a lot of other busses do too. So, if anything, not having
> > a struct device for nodes with "compatible" property is an exception.
>
> Again, that's not a description of the reality. The case we are talking
> about here does have a struct device. The misconception you keep
> repeating is that binding it to a driver is the only valid approach.
>
> >
> > Honestly, if one has a driver that supports a HW IP, I don't see any
> > reason it should operate outside of the device-driver model supported
> > by driver core. The driver code is there for a reason -- to solve all
> > the common problems faced by drivers.
>
> News flash: this isn't the case. Most of the code I deal with cannot
> be represented as a driver, because it is needed way earlier
> than the device model.
>
> > Operating outside of it just
> > causes reinventing the wheel, things like playing chicken with
> > initcalls to make sure drivers initialize their device in the right
> > order, not working with deferred probe, etc. For example, the rockchip
> > driver in your device (the one that follows case 2) tries to get some
> > clocks. But if that fails with -EPROBE_DEFER, the driver has no way
> > for it to recover and just doesn't register the GPIO anymore.
> >
> > Obviously exceptions are allowed for devices that are needed before
> > the driver core even comes up -- like early, clocks, irqs and timers
> > for the kernel/scheduler to kick off.
>
> There you go. Exceptions. Tons of. The trouble is, you are rewriting
> the rules of the game. Except that you are about 10 year late to
> the party. Forcing your changes on everyone by saying that was
> perfectly valid a month ago is now illegal doesn't really fly.
>
> Anyway, I said what I had to say. If platforms break with this
> change, I'll expect it to be disabled in 5.12.

I'm thinking we can not change the default and will probably revert this
patch "soon".

thanks,

greg k-h