Re: [PATCH] input/psmouse: Don't hold the mutex while calling ->disconnect

From: Daniel Vetter
Date: Mon Oct 01 2018 - 05:24:03 EST


On Wed, May 2, 2018 at 11:06 AM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
>
> On Mon, Apr 30, 2018 at 11:17 PM, Dmitry Torokhov
> <dmitry.torokhov@xxxxxxxxx> wrote:
> > Hi Daniel,
> >
> > On Mon, Apr 30, 2018 at 09:56:49PM +0200, Daniel Vetter wrote:
> >> At least trackpoint_disconnect wants to remove some sysfs files, and
> >> we can't remove sysfs files while holding psmouse_mutex:
> >>
> >> ======================================================
> >> WARNING: possible circular locking dependency detected
> >> 4.16.0-rc5-g613eb885b69e-drmtip_1+ #1 Tainted: G U
> >> ------------------------------------------------------
> >> kworker/0:3/102 is trying to acquire lock:
> >> (kn->count#130){++++}, at: [<000000009679748b>] kernfs_remove_by_name_ns+0x3b/0x80
> >>
> >> but task is already holding lock:
> >> (psmouse_mutex){+.+.}, at: [<0000000014f44bcc>] psmouse_disconnect+0x62/0x160
> >>
> >> which lock already depends on the new lock.
> >>
> >> the existing dependency chain (in reverse order) is:
> >>
> >> -> #1 (psmouse_mutex){+.+.}:
> >> psmouse_attr_set_helper+0x28/0x140
> >> kernfs_fop_write+0xfe/0x180
> >> __vfs_write+0x1e/0x130
> >> vfs_write+0xbd/0x1b0
> >> SyS_write+0x40/0xa0
> >> do_syscall_64+0x65/0x1a0
> >> entry_SYSCALL_64_after_hwframe+0x42/0xb7
> >>
> >> -> #0 (kn->count#130){++++}:
> >> __kernfs_remove+0x243/0x2b0
> >> kernfs_remove_by_name_ns+0x3b/0x80
> >> remove_files.isra.0+0x2b/0x60
> >> sysfs_remove_group+0x38/0x80
> >> sysfs_remove_groups+0x24/0x40
> >> trackpoint_disconnect+0x2c/0x50
> >> psmouse_disconnect+0x8f/0x160
> >> serio_disconnect_driver+0x28/0x40
> >> serio_driver_remove+0xc/0x10
> >> device_release_driver_internal+0x15b/0x230
> >> serio_handle_event+0x1c8/0x260
> >> process_one_work+0x215/0x620
> >> worker_thread+0x48/0x3a0
> >> kthread+0xfb/0x130
> >> ret_from_fork+0x3a/0x50
> >>
> >> other info that might help us debug this:
> >>
> >> Possible unsafe locking scenario:
> >>
> >> CPU0 CPU1
> >> ---- ----
> >> lock(psmouse_mutex);
> >> lock(kn->count#130);
> >> lock(psmouse_mutex);
> >> lock(kn->count#130);
> >>
> >> *** DEADLOCK ***
> >>
> >> 6 locks held by kworker/0:3/102:
> >> #0: ((wq_completion)"events_long"){+.+.}, at: [<000000002e408bfa>] process_one_work+0x191/0x620
> >> #1: (serio_event_work){+.+.}, at: [<000000002e408bfa>] process_one_work+0x191/0x620
> >> #2: (serio_mutex){+.+.}, at: [<00000000c8a49847>] serio_handle_event+0x23/0x260
> >> #3: (&dev->mutex){....}, at: [<00000000b55eee75>] device_release_driver_internal+0x2f/0x230
> >> #4: (&serio->drv_mutex){+.+.}, at: [<000000009719f997>] serio_disconnect_driver+0x16/0x40
> >> #5: (psmouse_mutex){+.+.}, at: [<0000000014f44bcc>] psmouse_disconnect+0x62/0x160
> >>
> >> stack backtrace:
> >> CPU: 0 PID: 102 Comm: kworker/0:3 Tainted: G U 4.16.0-rc5-g613eb885b69e-drmtip_1+ #1
> >> Hardware name: LENOVO 74591P0/74591P0, BIOS 6DET28WW (1.05 ) 07/30/2008
> >> Workqueue: events_long serio_handle_event
> >> Call Trace:
> >> dump_stack+0x5f/0x86
> >> print_circular_bug.isra.18+0x1d0/0x2c0
> >> __lock_acquire+0x14ae/0x1b60
> >> ? kernfs_remove_by_name_ns+0x20/0x80
> >> ? lock_acquire+0xaf/0x200
> >> lock_acquire+0xaf/0x200
> >> ? kernfs_remove_by_name_ns+0x3b/0x80
> >> __kernfs_remove+0x243/0x2b0
> >> ? kernfs_remove_by_name_ns+0x3b/0x80
> >> ? kernfs_name_hash+0xd/0x70
> >> ? kernfs_find_ns+0x7e/0x100
> >> kernfs_remove_by_name_ns+0x3b/0x80
> >> remove_files.isra.0+0x2b/0x60
> >> sysfs_remove_group+0x38/0x80
> >> sysfs_remove_groups+0x24/0x40
> >> trackpoint_disconnect+0x2c/0x50
> >> psmouse_disconnect+0x8f/0x160
> >> serio_disconnect_driver+0x28/0x40
> >> serio_driver_remove+0xc/0x10
> >> device_release_driver_internal+0x15b/0x230
> >> serio_handle_event+0x1c8/0x260
> >> process_one_work+0x215/0x620
> >> worker_thread+0x48/0x3a0
> >> ? _raw_spin_unlock_irqrestore+0x4c/0x60
> >> kthread+0xfb/0x130
> >> ? process_one_work+0x620/0x620
> >> ? _kthread_create_on_node+0x30/0x30
> >> ret_from_fork+0x3a/0x50
> >>
> >> Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx>
> >> Cc: Dmitry Torokhov <dmitry.torokhov@xxxxxxxxx>
> >> Cc: Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx>
> >> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> >> Cc: Arvind Yadav <arvind.yadav.cs@xxxxxxxxx>
> >> Cc: Stephen Lyons <slysven@xxxxxxxxxxxxxxx>
> >> Cc: linux-input@xxxxxxxxxxxxxxx
> >> ---
> >> drivers/input/mouse/psmouse-base.c | 2 ++
> >> 1 file changed, 2 insertions(+)
> >>
> >> diff --git a/drivers/input/mouse/psmouse-base.c b/drivers/input/mouse/psmouse-base.c
> >> index 8900c3166ebf..06ccd8e22f3c 100644
> >> --- a/drivers/input/mouse/psmouse-base.c
> >> +++ b/drivers/input/mouse/psmouse-base.c
> >> @@ -1484,8 +1484,10 @@ static void psmouse_disconnect(struct serio *serio)
> >> psmouse_deactivate(parent);
> >> }
> >>
> >> + mutex_unlock(&psmouse_mutex);
> >> if (psmouse->disconnect)
> >> psmouse->disconnect(psmouse);
> >> + mutex_lock(&psmouse_mutex);
> >
> > Why do you think it is proper to drop this mutex? It is introduced for a
> > reason.
> >
> > I think the trace you are seeing is due to:
> >
> > commit 988cd7afb3f37598891ca70b4c6eb914c338c46a
> > Author: Tejun Heo <tj@xxxxxxxxxx>
> > Date: Mon Feb 3 14:02:58 2014 -0500
> >
> > kernfs: remove kernfs_addrm_cxt
> >
> > where we started taking kernfs_mutex when adding/removing sysfs files.
> > Ultimately I think we may have to change switching protocol to a work
> > item to be done asynchronously from writing to sysfs attribute.
>
> Well "holding a lock while adding/removing sysfs files and holding the
> same lock from sysfs read/write callbacks" is a classic deadlock. I've
> made lockdep shut up about it, I have no idea how to fix it properly.
> kernfs switching it's locking doesn't change anything here.
>
> Generally the fix is to untangle the locking: You need 1 set of locks
> to protect the datastructures exposed through sysfs, and another thing
> (maybe that's provided by serio already, I have no idea) to make sure
> you're ordering connect/disconnect handling correct and there's not
> concurrent calls to this hooks. Assuming serio does give that
> guarantee then there's no need to hold the lock while unregistering
> the sysfs files (we could perhaps push the lock dropping down into
> trackpoint_disconnect, but that's less maintainable imo since it's not
> obvious in psmouse_disconnect what's going on), and my patch is
> correct.
>
> But I didn't do a full locking audit of what exactly serio guarantees,
> and what exactly psmouse_mutex needs to protect (and where
> psmouse_mutex is only held because it's convenient).

Ping? Just noticed that I still have this bugfix hanging around.
bugfix as in "make lockdep more useful", not necessarily fixing the
locking here properly. I guess I could respin to add a FIXME comment,
but not sure how useful that would be.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch