Re: [PATCH v7 09/12] sysfs: fix deadlock race with module removal

From: Luis Chamberlain
Date: Mon Sep 20 2021 - 21:42:53 EST


On Mon, Sep 20, 2021 at 02:36:38PM -0700, Bart Van Assche wrote:
> On 9/17/21 10:04 PM, Luis Chamberlain wrote:
> > A sketch of how this can happen follows:
> >
> > CPU A CPU B
> > whatever_store()
> > module_unload
> > mutex_lock(foo)
> > mutex_lock(foo)
> > del_gendisk(zram->disk);
> > device_del()
> > device_remove_groups()
> >
> > In this situation whatever_store() is waiting for the mutex foo to
> > become unlocked, but that won't happen until module removal is complete.
> > But module removal won't complete until the sysfs file being poked
> > completes which is waiting for a lock already held.
>
> If I remember correctly I encountered the deadlock scenario described
> above for the first time about ten years ago while working on the SCST
> project. We solved this deadlock by removing the sysfs attributes from
> the module unload code before grabbing mutex_lock(foo), e.g. by calling
> sysfs_remove_file().

Well the sysfs attributes in zram do tons of funky mucking around so
unfortunately no. It's not the only driver where this can happen. It is
why I decided to work on a generic solution instead.

> This works because calling sysfs_remove_file()
> multiple times in a row is safe. Is that solution good enough for the
> zram driver?

The sysfs attributes are group attributes part of the block, and so are
removed for the driver on a del_gendisk(). So unfortunately no, this
would not be a good solution in this case.

Luis