Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier

From: Benjamin Herrenschmidt
Date: Sun Jul 01 2018 - 03:19:16 EST


On Sat, 2018-06-30 at 20:57 -0700, Linus Torvalds wrote:
> On Sat, Jun 30, 2018 at 8:42 PM Benjamin Herrenschmidt
> <benh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > But what if something *else* still holds a reference to the kobject ?
> > It could be anything really... t
>
> But that's fine. Then the object will continue to exist, and the sysfs
> file will continue to exist, and you won't get a new glue directory.

I suspect you didn't read it my entire argument or I wasn't clear
enough :-) This is actually the crux of the problem:

Yes the object continues to exist. However, the *last* kobject_put to
it will no longer be done under whatever higher level locking the
subsystem provides (whatever prevents for example concurrent add and
removes).

Thus in that scenario the "last minute" kobject_release() done by the
last kobject_put() will be effectively unprotected from for example the
gdp_mutex (in the case of the gluedirs) or whatever other locking the
subsystem owning the kobject is using to avoid making that "refount 0"
object "discoverable".

So not only the conitnued existence of the object will potentially
trigger the duplicate sysfs name problem, but it *will* also re-open
the window for the object to be temporarily be visible with a 0
refcount.

The kobject debugging doesn't create a new race here. It just
significantly increase the size of an existing race window.

You argued yourself that the reason that window is a non-issue without
kobject debugging is that the expectation is that the release happens
synchronously with the last kobject_put done by device_del, which has
appropriate higher-level locking.

My point is that this specific assumption doesn't hold in the case
where another reference exists. In that case the object will be freed
with the race open for others to observe it while its refcount is 0.

> In fact, what you describe is a problem with *your* patch, exactly
> because you introduce another counter that is *not* the reference
> count, and now you have the problem with "old directory kobject is
> still live, but I removed the sysfs part, and now I'm creating a new
> object with the same name".
>
> Hmm?

So my patch 1/2 prevents us from finding the old dying object (and thus
from crashing) but replaces this with the duplicate name problem.

My patch 2/2 removes that second problem by ensuring we remove the
object from sysfs synchronously in device_del when it no longer
contains any children, explicitely rather than implicitely by the
virtue of doing the "last" kobject_put.

You'll notice the existing code wraps that kobject_put() with the
gdp_mutex, specifically I suppose to synronize with the creation path
of the gluedir, ie with the expectation that this kobject_put() will
synchronously remove the object from the kset.

I argue this assumption is flawed, that another reference could delay
the removal from the kset to a point where the mutex isn't held, and
thus the race re-opens.

My patch doesn't implement a separate "refcount" per-se, it implements
a "childcount". This is akin to mm_users vs mm_count.

We want the glue dir to be removed when it has no more children, we
currently "conflate" this with the kobject having no reference. This is
what I argue is incorrect. The kobject can have "unrelated" references,
that define the lifetime of the object in memory, but it's presence in
sysfs is dictated by the number of children it contains.

> > It is and there's a WARN_ON about it inside kobject_get(). I don't
> > think anybody argues against that, you are absolutely right.
>
> No. That the zero kobject_get() will not result in a warning. It just
> does a kref_get(), no warnings anywhere.

It's there but it's in refcount:

void refcount_inc(refcount_t *r)
{
WARN_ONCE(!refcount_inc_not_zero(r), "refcount_t: increment on 0; use-after-free.\n");
}
EXPORT_SYMBOL(refcount_inc);

In fact that's how I started digging into that problem in the first place :-)

> Yes, there is a kobject_get() warning, but that's about an
> _uninitialized_ kobject, not a already released one! You will get no
> warning that I can see from the "oh, you just got a stale one".

Cheers,
Ben.

> Linus