Re: [PATCH 2/2] drivers: core: Remove glue dirs from sysfs earlier

From: Benjamin Herrenschmidt
Date: Mon Jul 02 2018 - 20:58:10 EST


On Mon, 2018-07-02 at 12:24 -0700, Linus Torvalds wrote:
> On Mon, Jul 2, 2018 at 3:23 AM Benjamin Herrenschmidt
> <benh@xxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Let me try one last ditch attempt to convince you using maybe a
> > different perspective: this is how sysfs is intended to work and how
> > the device model already does everywhere else except the gluedirs.
>
> So don't get me wrong. I don't think that patch is _wrong_. It may
> well be the best thing we can do now.
>
> I just think some of the arguments for the patch are bogus.
>
> I still think that the auto-cleanup is actually a good thing in
> general. Not because it simplifies things (which it can do), but
> because it fundamentally *allows* you to use less locking.
>
> And that ends up being my real reason to not like your patch all that
> much. It depends on
>
> (a) an extra reference count

Right, that is not that great, and it might be possible to avoid it if
we have a way to check that the sysfs dir is empty. Maybe kobj->sd-
>subdirs ? This will find subdirs, not files, but for gluedirs that's
sufficient. I can play with that today see if it works.

BTW. Talking of extra reference counts, we already have two everywhere
afaik... kernfs has its own that's separate from the kobject one. I got
lost in that maze a couple of times already.

> (b) on fairly heavy-handed locking (ie the whole "lock on release
> too, not just on allocation").

This is already there, I'm not adding it ;-) It's just ineffective
without my patch due to the possibility of the "auto-cleanup" to leak
outside the lock.

> and I think both of those are non-optimal.
>
> So:
>
> > The actual lifetime of the struct device is handled *separately* by
> > device_get/put which are just wrappers on kobject_get/put.
>
> I agree that that is the end result of your patch (and perhaps the
> buggy intent of the original code).

This is the existing device_add/del without my patch. I was describing
how the core device stuff works for the normal device directories
today. My patch makes the gludirs do the same thing basically, which I
think was the intend of the code already but was broken.

> I just don't necessarily agree it's a great thing in general. It
> basically uses sysfs visibility as an argument for why lifetimes
> should differ from the refcounted lifetimes.

Yes, it does that so that we can remove and re-add without name
clashes.

> And that may be a practical argument, but I don't think it's a very
> good argument in general. I think it would arguably be much better to
> be less serialized, and depend on refcounting more, and make less of a
> deal about the sysfs visibility.
>
> For example, if we *really* add back the exact same device immediately
> after removing it, and the device was still in use somehow (ie the
> refcount never went down to zero), maybe we really should not have
> reused the same name? Or if we do re-use the same name, maybe we
> should have re-used the device node itself?

Well, maybe but that's what we've been doing forever, and I think
drivers deal with it already. It applies to devices and to chardevs for
example.

> See what I'm saying? I understand where your patch is coming from, but
> I am suspicious of the fact that it seems to want to re-use a name
> (but not the object) that is by definition still in use.

Yes, I see what you mean.

I don't really have the same qualms as you do, it's how we've done
things forever (it's just that the gluedir code was broken), and it
generally works well.

It's not different in principle to what we do with files. You can
unlink an open file. It's still "in use" but the directory entry is
gone and one can create a new one.

It a slightly more roundabout way, we do that with mm_structs, the last
process can be gone but we keep them around due to being referenced (or
lazily attached to a kthread). Here we have 2 counters too.

> Maybe that's the right thing in this case. Considering that we have a
> real oops and a real problem, and I don't have an alternative patch, I
> guess the "patches talk, bullshit walks" rule applies.

Haha, as you prefer. Patch 1 fixes the oops, patch 2 fixes the name
collision, so in theory you could get away with just patch 1. But then
device would start failing to be re-added or re-bound.

(In my case it was a sub-device created by the driver of a parent
device, so it happens on bind/unbind of that parent driver).

I'll have a look see if I can find a way to check that the sysfs dir is
empty without adding that childcount, that would at least alleviate
some of your objection, and would probably make the patch
smaller/simpler.

Cheers,
Ben.