[PATCH 1/2] drivers: core: Don't try to use a dead glue_dir

From: Benjamin Herrenschmidt
Date: Thu Jun 28 2018 - 22:22:07 EST


Under some circumstances (such as when using kobject debugging)
a gluedir whose kref is 0 might remain in the class kset for
a long time. The reason is that we don't actively remove glue
dirs when they become empty, but instead rely on the implicit
removal done by kobject_release(), which can happen some amount
of time after the last kobject_put().

Using such a dead object is a bad idea and will lead to warnings
and crashes.

Unfortunately that can happen in get_device_parent() if the
last child of a glue dir was removed and a new one added
before the glue dir gets fully released().

This prevents this by making get_device_parent() only "find"
a glue dir whose refcount is non-0.

While this fixes the crash, it doesn't fully fix the problem,
instead the race will now result in an error attempting to
use a duplicate file name in sysfs. A fix for that will come
separately.

Signed-off-by: Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>
---

(Adding lkml, I just realized I completely forgot to CC it in
the first place on this whole conversation, blame the 1am debugging
session)

drivers/base/core.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index b610816eb887..e9eff2099896 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1517,11 +1517,13 @@ static struct kobject *get_device_parent(struct device *dev,

/* find our class-directory at the parent and reference it */
spin_lock(&dev->class->p->glue_dirs.list_lock);
- list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry)
+ list_for_each_entry(k, &dev->class->p->glue_dirs.list, entry) {
if (k->parent == parent_kobj) {
- kobj = kobject_get(k);
- break;
+ kobj = kobject_get_unless_zero(k);
+ if (kobj)
+ break;
}
+ }
spin_unlock(&dev->class->p->glue_dirs.list_lock);
if (kobj) {
mutex_unlock(&gdp_mutex);