Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier

From: Benjamin Herrenschmidt
Date: Sat Jun 30 2018 - 23:52:40 EST


(This is a resend with lkml added for background & archival purposes)

On Sat, 2018-06-30 at 12:29 -0700, Linus Torvalds wrote:
> On Thu, Jun 28, 2018 at 7:19 PM Benjamin Herrenschmidt
> <benh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > For devices with a class, we create a "glue" directory between
> > the parent device and the new device with the class name.
> >
> > This directory is never "explicitely" removed when empty however,
> > this is left to the implicit sysfs removal done by kobjects when
> > they are released on the last kobject_put().
> >
> > This is problematic because as long as it's not been removed from
> > sysfs, it is still present in the class kset and in sysfs directory
> > structure.
>
> Ok, so I don't hate the patch per se, but looking around, I think this
> is still wrong in the end.
>
> Why?
>
> Because normally, the last kobject_put() really does do a synchronous
> kobject_del(). So normally, this is all completely pointless, and the
> normal kobject lifetime rules should be sufficient.

Not entirely sadly....

There's a small race between the kref going down to 0 and the last
kobject_put(). Something might still "catch" the 0-reference object
during that window.

In this specific case our bacon is mostly saved by the gdp_mutex which
is taken by cleanup_glue_dir() around what *should* be the lats
kobject_put.... but what if it isn't ?

If anything else happens to hold a reference to the directory object
(open files in sysfs maybe ?), then the last put will come from
elsewhere and will happen without that mutex being held, thus re-
opening the tiny race.

Is this possible ?

> So I *think* your problem happens because you have
> CONFIG_DEBUG_KOBJECT_RELEASE enabled, and that intentionally delays
> the cleanup.

I think it just opens an existing race more widely. The race always
exist becaues another CPU can observe the object between the reference
going to 0 and the last kobject_del done by kobject_release. That's one
main reason why I dislike this "auto-clean" mechanism.

One other way to solve it, which I just thought about, could be to,
inside kobject_put() itself, check that the reference is *1* and do
kobject_del() before the last kref_put. That does mean that somebody
can snatch it in that window after it's been removed from sysfs though,
is that ok ? It won't crash I suppose...

> This is actually not really what DEBUG_KOBJECT_RELEASE is really
> documented to do. It is documented as a "let's debug problems where
> drivers think deletion is immediate", but the sysfs interaction with
> the same-name issue really smells different.
>
> So what the patch does is basically to just fight
> DEBUG_KOBJECT_RELEASE delaying rules, and that kind of stinks.

Not entirely, it fight an existing race that DEBUG_KOBJECT_RELEASE just
opens more widely.

> To me, it really feels like either we should see the
> DEBUG_KOBJECT_RELEASE rules are "real" (in which case fighting them is
> wrong), or we should admit that DEBUG_KOBJECT_RELEASE causes problems
> (in which case we should probably try to fix the debug aid).
>
> Ben, can you confirm that your problem just goes away if you don't
> select DEBUG_KOBJECT_RELEASE?

The easily reproducable crash goes away, because the device I've been
observing these is a tiny slow single core ARM embedded thing. But I'm
not sure the therical race is solved.

> Greg - comments? The pattern of "remove last device, add a new device
> of the same class" really seems to be a valid pattern, and
> CONFIG_DEBUG_KOBJECT_RELEASE seems to actively break it.
>
> Could we perhaps add a synthetic test for exactly this pattern (add a
> silly device with a bogus class, remove it, and add another device
> with the same class)?
>
> Linus