Re: [PATCH v3] drivers/base/core: refcount kobject and bus on device attribute read / store
From: Greg KH
Date: Wed Jun 23 2021 - 12:51:47 EST
On Wed, Jun 23, 2021 at 09:14:34AM -0700, Luis Chamberlain wrote:
> kernfs / sysfs is in not struct device specific, it has no semantics for
> modules or even devices. The aspect of kernfs which deals with locking
> a struct kernfs_node is kernfs_get_active(). The refcount there of
> importance is the call to atomic_inc_unless_negative(&kn->active).
>
> struct kernfs_node *kernfs_get_active(struct kernfs_node *kn)
> {
> if (unlikely(!kn))
> return NULL;
>
> if (!atomic_inc_unless_negative(&kn->active))
> return NULL;
>
> if (kernfs_lockdep(kn))
> rwsem_acquire_read(&kn->dep_map, 0, 1, _RET_IP_);
> return kn;
> }
>
> BTW this was also precisely where I had suggested to extend the
> kernfs_node with an optional kn->owner and if set we try_module_get() to
> prevent the deadlock case if the module exit routine also happens to use
> a lock also used by a sysfs attribute store/read.
>
> The flow would be (from a real live gdb crash backtrace from an original
> bug report from a customer):
>
> write system call -->
> ksys_write ()
> vfs_write() -->
> __vfs_write() -->
> kernfs_fop_write() (note now this is kernfs_fop_write_iter()) -->
> sysfs_kf_write() -->
> dev_attr_store() -->
> null reference
>
> The dereference is because the dev_attr_store() call which we are
> modifying
>
> LINE-001 static ssize_t dev_attr_store(struct kobject *kobj, struct attribute *attr,
> LINE-002 const char *buf, size_t count)
> LINE-003 {
> LINE-004 struct device_attribute *dev_attr = to_dev_attr(attr);
> LINE-005 struct device *dev = kobj_to_dev(kobj);
> LINE-006 ssize_t ret = -EIO;
> LINE-007
> LINE-008 if (dev_attr->store)
> LINE-009 ret = dev_attr->store(dev, dev_attr, buf, count);
> ...
> }
>
> The race happens because a sysfs store / read can be triggered, the CPU
> could be preempted after LINE-008, and the ->store is gone by LINE-009.
> This begs the question if kernfs_fop_write_iter() or sysfs protects this
> somehow? Let's see.
>
> For kernfs we have:
>
> static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter)
> {
> struct kernfs_open_file *of = kernfs_of(iocb->ki_filp);
> ...
> mutex_lock(&of->mutex);
> if (!kernfs_get_active(of->kn)) {
> mutex_unlock(&of->mutex);
> len = -ENODEV;
> goto out_free;
> }
>
> ops = kernfs_ops(of->kn);
> if (ops->write)
> len = ops->write(of, buf, len, iocb->ki_pos);
> else
> len = -EINVAL;
>
> kernfs_put_active(of->kn);
> mutex_unlock(&of->mutex);
> ...
> }
>
> And the write call here is a syfs calls:
>
> static ssize_t sysfs_kf_write(struct kernfs_open_file *of, char *buf,
> size_t count, loff_t pos)
> {
> const struct sysfs_ops *ops = sysfs_file_ops(of->kn);
> struct kobject *kobj = of->kn->parent->priv;
>
> if (!count)
> return 0;
>
> return ops->store(kobj, of->kn->priv, buf, count);
> }
>
> As we have observed already, the active reference obtained through
> kernfs_get_active() was for the struct kernfs_node. Sure, the
> ops->write() is valid, in this case it sysfs_kf_write().
>
> sysfs isn't doing any active reference check for the kobject device
> attribute as it doesn't care for them. So syfs calls
> dev_attr_store(), but the dev_attr_store() is not preventing the device
> attribute ops to go fishing, and we destroy them while we're destroying
> the device on module removal.
Ah, but sysfs _should_ be doing this properly.
I think the issue is that when we store the kobject pointer in kernfs,
it is NOT incremented. Look at sysfs_create_dir_ns(), if we call
kobject_get(kobj) right before we call kernfs_create_dir_ns(), and then
properly clean things up later on when we remove the sysfs directory
(i.e. the kobject), it _should_ fix this problem.
Then, we know, whenever show/store/whatever is called, when we cast out
of kernfs the private pointer to a kobject, that the kobject really is
still alive, so we can use it properly.
Can you try that, it should be a much "simpler" change here.
thanks,
greg k-h