Re: [PATCH 1/2] zram: fix crashes due to use of cpu hotplug multistate

From: Greg KH
Date: Sat Apr 03 2021 - 02:13:37 EST


On Fri, Apr 02, 2021 at 06:30:16PM +0000, Luis Chamberlain wrote:
> On Fri, Apr 02, 2021 at 09:54:12AM +0200, Greg KH wrote:
> > On Thu, Apr 01, 2021 at 11:59:25PM +0000, Luis Chamberlain wrote:
> > > As for the syfs deadlock possible with drivers, this fixes it in a generic way:
> > >
> > > commit fac43d8025727a74f80a183cc5eb74ed902a5d14
> > > Author: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> > > Date: Sat Mar 27 14:58:15 2021 +0000
> > >
> > > sysfs: add optional module_owner to attribute
> > >
> > > This is needed as otherwise the owner of the attribute
> > > or group read/store might have a shared lock used on driver removal,
> > > and deadlock if we race with driver removal.
> > >
> > > Signed-off-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> >
> > No, please no. Module removal is a "best effort",
>
> Not for live patching. I am not sure if I am missing any other valid
> use case?

live patching removes modules? We have so many code paths that are
"best effort" when it comes to module unloading, trying to resolve this
one is a valiant try, but not realistic.

> > if the system dies when it happens, that's on you.
>
> I think the better approach for now is simply to call testers / etc to
> deal with this open coded. I cannot be sure that other than live
> patching there may be other valid use cases for module removal, and for
> races we really may care for where userspace *will* typically be mucking
> with sysfs attributes. Monitoring my systems's sysfs attributes I am
> actually quite surprised at the random pokes at them.
>
> > I am not willing to expend extra energy
> > and maintance of core things like sysfs for stuff like this that does
> > not matter in any system other than a developer's box.
>
> Should we document this as well? Without this it is unclear that tons of
> random tests are sanely nullified. At least this dead lock I spotted can
> be pretty common form on many drivers.

What other drivers have this problem?

> > Lock data, not code please. Trying to tie data structure's lifespans
> > to the lifespan of code is a tangled mess, and one that I do not want to
> > add to in any form.
>
> Driver developers will simply have to open code these protections. In
> light of what I see on LTP / fuzzing, I suspect the use case will grow
> and we'll have to revisit this in the future. But for now, sure, we can
> just open code the required protections everywhere to not crash on module
> removal.

LTP and fuzzing too do not remove modules. So I do not understand the
root problem here, that's just something that does not happen on a real
system.

thanks,

greg k-h