Re: [PATCH v3] drivers: core: Remove glue dirs early only when refcount is 1

From: Greg KH
Date: Mon May 06 2019 - 02:22:09 EST


On Mon, May 06, 2019 at 10:41:34AM +0530, Prateek Sood wrote:
> On 5/1/19 5:29 PM, Prateek Sood wrote:
> > While loading firmware blobs parallely in different threads, it is possible
> > to free sysfs node of glue_dirs in device_del() from a thread while another
> > thread is trying to add subdir from device_add() in glue_dirs sysfs node.
> >
> > CPU1 CPU2
> > fw_load_sysfs_fallback()
> > device_add()
> > get_device_parent()
> > class_dir_create_and_add()
> > kobject_add_internal()
> > create_dir() // glue_dir
> >
> > fw_load_sysfs_fallback()
> > device_add()
> > get_device_parent()
> > kobject_get() //glue_dir
> >
> > device_del()
> > cleanup_glue_dir()
> > kobject_del()
> >
> > kobject_add()
> > kobject_add_internal()
> > create_dir() // in glue_dir
> > kernfs_create_dir_ns()
> >
> > sysfs_remove_dir() //glue_dir->sd=NULL
> > sysfs_put() // free glue_dir->sd
> >
> > kernfs_new_node()
> > kernfs_get(glue_dir)
> >
> > Fix this race by making sure that kernfs_node for glue_dir is released only
> > when refcount for glue_dir kobj is 1.
> >
> > Signed-off-by: Prateek Sood <prsood@xxxxxxxxxxxxxx>
> >
> > ---
> >
> > Changes from v2->v3:
> > - Added patch version change related comments.
> >
> > Changes from v1->v2:
> > - Updated callstack from _request_firmware_load() to fw_load_sysfs_fallback().
> >
> >
> > drivers/base/core.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 4aeaa0c..3955d07 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -1820,12 +1820,15 @@ static inline struct kobject *get_glue_dir(struct device *dev)
> > */
> > static void cleanup_glue_dir(struct device *dev, struct kobject *glue_dir)
> > {
> > + unsigned int refcount;
> > +
> > /* see if we live in a "glue" directory */
> > if (!live_in_glue_dir(glue_dir, dev))
> > return;
> >
> > mutex_lock(&gdp_mutex);
> > - if (!kobject_has_children(glue_dir))
> > + refcount = kref_read(&glue_dir->kref);
> > + if (!kobject_has_children(glue_dir) && !--refcount)
> > kobject_del(glue_dir);
> > kobject_put(glue_dir);
> > mutex_unlock(&gdp_mutex);
> >
>
> Folks,
>
> Please share feedback on the race condition and the patch to
> fix it.

Please relax, we will get to this eventually, it has only been a week...

greg k-h