Re: [PATCH 1/3] HID: logitech-hidpp: use devres to manage FF private data

From: Benjamin Tissoires
Date: Mon Oct 14 2019 - 05:13:48 EST


On Sat, Oct 12, 2019 at 1:24 AM Dmitry Torokhov
<dmitry.torokhov@xxxxxxxxx> wrote:
>
> On Sat, Oct 12, 2019 at 12:48:42AM +0200, Benjamin Tissoires wrote:
> > On Fri, Oct 11, 2019 at 11:34 PM Dmitry Torokhov
> > <dmitry.torokhov@xxxxxxxxx> wrote:
> > >
> > > On Fri, Oct 11, 2019 at 01:35:09PM -0700, Dmitry Torokhov wrote:
> > > > On Fri, Oct 11, 2019 at 01:33:03PM -0700, Dmitry Torokhov wrote:
> > > > > On Fri, Oct 11, 2019 at 09:25:52PM +0200, Benjamin Tissoires wrote:
> > > > > > On Fri, Oct 11, 2019 at 8:26 PM Dmitry Torokhov
> > > > > > <dmitry.torokhov@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Fri, Oct 11, 2019 at 04:52:04PM +0200, Benjamin Tissoires wrote:
> > > > > > > > Hi Andrey,
> > > > > > > >
> > > > > > > > On Mon, Oct 7, 2019 at 7:13 AM Andrey Smirnov <andrew.smirnov@xxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > > To simplify resource management in commit that follows as well as to
> > > > > > > > > save a couple of extra kfree()s and simplify hidpp_ff_deinit() switch
> > > > > > > > > driver code to use devres to manage the life-cycle of FF private data.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Andrey Smirnov <andrew.smirnov@xxxxxxxxx>
> > > > > > > > > Cc: Jiri Kosina <jikos@xxxxxxxxxx>
> > > > > > > > > Cc: Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx>
> > > > > > > > > Cc: Henrik Rydberg <rydberg@xxxxxxxxxxx>
> > > > > > > > > Cc: Sam Bazely <sambazley@xxxxxxxxxxxx>
> > > > > > > > > Cc: Pierre-Loup A. Griffais <pgriffais@xxxxxxxxxxxxxxxxx>
> > > > > > > > > Cc: Austin Palmer <austinp@xxxxxxxxxxxxxxxxx>
> > > > > > > > > Cc: linux-input@xxxxxxxxxxxxxxx
> > > > > > > > > Cc: linux-kernel@xxxxxxxxxxxxxxx
> > > > > > > > > Cc: stable@xxxxxxxxxxxxxxx
> > > > > > > >
> > > > > > > > This patch doesn't seem to fix any error, is there a reason to send it
> > > > > > > > to stable? (besides as a dependency of the rest of the series).
> > > > > > > >
> > > > > > > > > ---
> > > > > > > > > drivers/hid/hid-logitech-hidpp.c | 53 +++++++++++++++++---------------
> > > > > > > > > 1 file changed, 29 insertions(+), 24 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c
> > > > > > > > > index 0179f7ed77e5..58eb928224e5 100644
> > > > > > > > > --- a/drivers/hid/hid-logitech-hidpp.c
> > > > > > > > > +++ b/drivers/hid/hid-logitech-hidpp.c
> > > > > > > > > @@ -2079,6 +2079,11 @@ static void hidpp_ff_destroy(struct ff_device *ff)
> > > > > > > > > struct hidpp_ff_private_data *data = ff->private;
> > > > > > > > >
> > > > > > > > > kfree(data->effect_ids);
> > > > > > > >
> > > > > > > > Is there any reasons we can not also devm alloc data->effect_ids?
> > > > > > > >
> > > > > > > > > + /*
> > > > > > > > > + * Set private to NULL to prevent input_ff_destroy() from
> > > > > > > > > + * freeing our devres allocated memory
> > > > > > > >
> > > > > > > > Ouch. There is something wrong here: input_ff_destroy() calls
> > > > > > > > kfree(ff->private), when the data has not been allocated by
> > > > > > > > input_ff_create(). This seems to lack a little bit of symmetry.
> > > > > > >
> > > > > > > Yeah, ff and ff-memless essentially take over the private data assigned
> > > > > > > to them. They were done before devm and the lifetime of the "private"
> > > > > > > data pieces was tied to the lifetime of the input device to simplify
> > > > > > > error handling and teardown.
> > > > > >
> > > > > > Yeah, that stealing of the pointer is not good :)
> > > > > > But OTOH, it helps
> > > > > >
> > > > > > >
> > > > > > > Maybe we should clean it up a bit... I'm open to suggestions.
> > > > > >
> > > > > > The problem I had when doing the review was that there is no easy way
> > > > > > to have a "devm_input_ff_create_()", because the way it's built is
> > > > > > already "devres-compatible": the destroy gets called by input core.
> > > > >
> > > > > I do not think we want devm_input_ff_create() explicitly, I think the
> > > > > fact that you can "build up" an input device by allocating it, then
> > > > > adding slots, poller, ff support, etc, and input core cleans it up is
> > > > > all good. It is just the ownership if the driver-private data block is
> > > > > not very obvious and is not compatible with allocating via devm.
> > > > >
> > > > > >
> > > > > > So I don't have a good answer to simplify in a transparent manner
> > > > > > without breaking the API.
> > > > > >
> > > > > > >
> > > > > > > In this case maybe best way is to get rid of hidpp_ff_destroy() and not
> > > > > > > set ff->private and rely on devm to free the buffers. One can get to
> > > > > > > device private data from ff methods via input_get_drvdata() since they
> > > > > > > all (except destroy) are passed input device pointer.
> > > > > >
> > > > > > Sounds like a good idea. However, it seems there might be a race when
> > > > > > removing the workqueue:
> > > > > > the workqueue gets deleted in hidpp_remove, when the input node will
> > > > > > be freed by devres, so after the call of hidpp_remove.
> > > > >
> > > > > Yeah, well, that is a common issue with mixing devm and normal resources
> > > > > (and workqueue here is that "normal" resource), and we should either:
> > > > >
> > > > > - not use devm
> > > > > - use devm_add_action_or_reset() to work in custom actions that work
> > > > > freeing of non-managed resources into devm flow.
> > > >
> > > > Actually, there is a door #3: use system workqueue. After all the work
> > > > that Tejun done on workqueues it is very rare that one actually needs
> > > > a dedicated workqueue (as works usually execute on one if the system
> > > > worker threads that are shared with other workqueues anyway).
> > >
> > > And additional note about devm:
> > >
> > > I think all HID input drivers that are using devm in probe, but do not
> > > have proper remove() function (and maybe even some with remove) are
> > > broken: hid_device_remove() calls hid_hw_stop() which potentially will
> > > shut off the transport. This happens before devm starts unwinding, so
> > > we still can be trying to communicate with the device in question, but
> > > the transport is gone.
> >
> > Well, that is by design. A driver is supposed to call hid_hw_start()
> > at the very end of its .probe(). And the supposed rule is that in the
> > specific .remove(), you are to call first hid_hw_stop() to stop the
> > transport layer underneath. That also means that in the HID subsystem,
> > at least, you are not supposed to talk to the device during the devm
> > teardown of the allocated data.
> >
> > If you really need to communicate with the device during tear down,
> > then you are supposed to write your own .remove, in which you control
> > where the hid_hw_stop() happens.
> >
> > We might have overlooked one or two, but I think we are on a good basis for now.
>
> You have to be _very_ careful there. For example, we can take a look at
> hid-elan.c. If you notice, it uses devm_led_classdev_register() to
> create "mute" led and it needs to talk over HID to control it;s
> brightness/state. So the driver has custom remove() and calls
> hid_hw_stop() from it. But the LED will be unregistered much later (in
> the depth of the driver core) so users of LED subsystem are free to send
> requests through and the driver will try to talk to the device even
> after hid_hw_stop() is called and the io_started/driver_input_lock is
> reset/released.
>
> I am sure there are more such examples.

Yep, this is problematic. There is no guard in
elan_mute_led_set_brigtness() which tells us that the bus has been
stopped, so we likely have an issue here.

Note that a .remove() that just calls hid_hw_stop() should be removed,
as hid core can do it for us.

>
> >
> > >
> > > io_started/driver_input_lock is broken on removal as well as we release
> > > the lock when driver may very well be still talking to the device in
> > > devm teardown actions.
> >
> > Again, this is not supposed to happen. Once hid_hw_stop() is called,
> > we do not have access to the transport, so drivers can't talk to the
> > device. So releasing/clearing the locks is supposed to be safe now.
>
> Except that it is hard to enforce once you throw in devm.
>
> >
> > >
> > > I think we have similar kind of issues in other buses as well (i2c, spi,
> > > etc). For example, in i2c we remove the device from power domain before
> > > we actually complete devm unwinding.
> > >
> >
> > I agree that this looks bad.
> >
> > I would need to have a better look at it on Monday. Time to go on week
> > end (this jet lag doesn't help me to go to sleep...)
>
> I wonder if every bus should open a new devm group for device and
> manually release it after calling ->remove(). That would ensure that all
> devm resouces allocated by drivers will be freed before we start
> executing bus-specific code.
>

That would be indeed useful. There is no reasons I can think of for a
resource to be created during the .probe() of a device that should
stick around after its .remove().

In the Elan case above, it won't solve all of the issues, as there
will still be a tiny window where the resource will get access to the
bus when it has been stopped.
Maybe adding an other group when we call hid_hw_start() that will be
freed by hid_hw_stop() before the actual stop to the bus could come to
the rescue....

Cheers,
Benjamin