Re: [PATCH v2 4/4] test_sysfs: demonstrate deadlock fix

From: Greg KH
Date: Wed Jul 21 2021 - 07:42:46 EST


On Sat, Jul 03, 2021 at 10:28:28AM -0700, Luis Chamberlain wrote:
> On Sat, Jul 03, 2021 at 06:49:46AM +0200, Greg KH wrote:
> > On Fri, Jul 02, 2021 at 05:46:32PM -0700, Luis Chamberlain wrote:
> > > +#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \
> > > +static ssize_t module_ ## _name ## _store(struct device *dev, \
> > > + struct device_attribute *attr, \
> > > + const char *buf, size_t len) \
> > > +{ \
> > > + ssize_t __ret; \
> > > + if (!try_module_get(THIS_MODULE)) \
> > > + return -ENODEV; \
> > > + __ret = _name ## _store(dev, attr, buf, len); \
> > > + module_put(THIS_MODULE); \
> > > + return __ret; \
> > > +}
> >
> > As I have pointed out before, doing try_module_get(THIS_MODULE) is racy
> > and should not be added back to the kernel tree. We got rid of many
> > instances of this "bad pattern" over the years, please do not encourage
> > it to be added back as others will somehow think that it correct code.
>
> It is noted this is used in lieu of any agreed upon solution to
> *demonstrate* how this at least does fix it. In this case (and in the
> generic solution I also had suggested for kernfs a while ago), if the
> try fails, we give up. If it succeeds, we now know we can rely on the
> device pointer. If the refcount succeeds, can the module still not
> be present? Is try_module_get() racy in that way? In what way is it
> racy and where is this documented? Do we have a selftest to prove the
> race?

As I say in the other email where you tried to add this, think about
what happens if the module is removed _right before_ you make this call.

Or a few instructions before that. The race is still there, this fixes
nothing except make the window smaller.

thanks,

greg k-h