Re: [PATCH] drivers/base: use a worker for sysfs unbind

From: Daniel Vetter
Date: Mon Dec 10 2018 - 05:18:39 EST


On Mon, Dec 10, 2018 at 11:06:34AM +0100, Greg Kroah-Hartman wrote:
> On Mon, Dec 10, 2018 at 09:46:53AM +0100, Daniel Vetter wrote:
> > Drivers might want to remove some sysfs files, which needs the same
> > locks and ends up angering lockdep. Relevant snippet of the stack
> > trace:
> >
> > kernfs_remove_by_name_ns+0x3b/0x80
> > bus_remove_driver+0x92/0xa0
> > acpi_video_unregister+0x24/0x40
> > i915_driver_unload+0x42/0x130 [i915]
> > i915_pci_remove+0x19/0x30 [i915]
> > pci_device_remove+0x36/0xb0
> > device_release_driver_internal+0x185/0x250
> > unbind_store+0xaf/0x180
> > kernfs_fop_write+0x104/0x190
> >
> > I've stumbled over this because some new patches by Ram connect the
> > snd-hda-intel unload (where we do use sysfs unbind) with the locking
> > chains in the i915 unload code (but without creating a new loop),
> > which upset our CI. But the bug is already there and can be easily
> > reproduced by unbind i915 directly.
>
> This is odd, why wouldn't any driver hit this issue? And why now since
> you say this is triggerable today?

The above backtrace is triggered by unbinding i915 on current upstream
kernels. Note: Will crash later on rather badly in the
fbdev/fbcon/vtconsole hell, but that's separate issue (which can be worked
around by first unbinding fbcon manually through sysfs).

> I know scsi was doing some strange things like trying to remove the
> device itself from a sysfs callback on the device, which requires it to
> just call a different kobject function created just for that type of
> thing. Would that also make sense to do here instead of your workqueue?

Note how we blow up on unregistering sw device instances supported by i915
in entirely different subsystems. I guess most drivers just have sysfs
files for their own stuff, where this is done as you describe. The problem
is that there's an awful lot of unrelated stuff hanging off i915.

Or maybe acpi_video is busted, and should be using a different function.
You haven't said which one, and I have no idea which one it is ...

And in case the context wasn't clear: This is unbinding the i915 pci
driver which triggers the above lockdep splat recursion.

Thanks, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch