Re: [PATCH 0/3] refcounting improvements in sysfs.

From: Neil Brown
Date: Mon Mar 29 2010 - 01:11:16 EST


On Fri, 26 Mar 2010 15:32:51 +0900
Tejun Heo <teheo@xxxxxxx> wrote:

> Hello, Neil.
>
> On 03/26/2010 03:02 PM, Neil Brown wrote:
> >> Nice article. In general, yeap, I agree it would be nice to have a
> >> working reference count abstraction. However, kref along with kobject
> >> is a good example of obscurity by abstraction anti pattern. :-)
> >
> > I'm not at all sure that opinion would be universal....
> >
> > refcounting is something that it is quite easy to get wrong. There are
> > several slightly different models for refcounting and if you don't have a
> > clear understanding of the different use cases it is easy to get confused
> > about exactly what model is being used and so use a refcount wrongly.
> > kref certainly doesn't cover all models for refcounting but it does cover one
> > fairly common one very well and I think that it's use bring clarity rather
> > than obscurity.
> > Of course if it is used for a refcount which should really follow a different
> > model then that can cause confusion...
>
> I don't know. After spending some time with k* and device model, I
> grew a pretty strong dislike for abstractions without clear functional
> necessities. kref being much simpler than kobject, the abuse is much
> less severe but there have been kref misuses (in kobject and SCSI
> midlayer) which could have been avoided or at least easily located if
> they simply had used atomic_t instead of dreaming up some mystical
> properties of kref.

Hi Tejun,

this strikes me as really valuable experience that it would be great to
share. While I generally like kref and see value in at least some of
kobject I don't for a moment suppose they are perfect. Fixing them requires
a good understanding of the problems they cause. If you have that knowledge,
it would be a great resource for anyone wanting to 'fix' kobject.
Are you interested in writing anything (more) up at all???


>
> >> kobject API incorrectly suggests that it deals with the last put
> >> problem. There still are large number of code paths which do the
> >> following,
> >>
> >> if (!(kob = kobject_get(kobj)))
> >> return;
> >
> > kobject_get *always* returns exactly the argument that was passed to it.
> > (kref_get doesn't have a return value.)
> >
> > I don't see how the code above has any bearing on the last-put
> > problem, which I think kref and thus kobject do handle exactly
> > correctly.
>
> Oh, I should have been more explicit. It's not directly related to
> kref but just something that always comes to my mind when thinking
> about k* abstractions. The above bogus condition checks used to be
> used quite widely. The programmer for some reason believed the last
> kobject_put() somehow will magically make future kobject_get()s return
> NULL, which of course doesn't make any sense but hey the
> implementation is buried kilometers deep, the API and other usages
> seem to suggest that and it's easy to imagine something up when you're
> tired.

This would argue that having a return value from kobject_get violates Rusty's
law that interfaces should be hard to misuse.
We could probably change that - it is only used 19 times in the current
kernel.
It probably doesn't help that Documentation/kobject.txt includes the text:

A successful call to kobject_get() will increment the kobject's reference
counter and return the pointer to the kobject.

which seems to suggest that an unsuccessful call is possible.


>
> As an another example, please take a look at the kref API.
>
> int kref_put(struct kref *kref, void (*release) (struct kref *kref));
>
> The function takes @release callback but under which context is it
> called? If you look at the source code, it's called in-line which
> isn't clear from the API at all (why the hell take a callback if
> you're gonna call it in-line?) and there have been *several* subtle
> bugs which could have been avoided or at least would have been caught
> much easier if it were not for that meaningless separate release
> callback. It's just too easy to forget about the executing context
> when people write and review stuff over function boundaries.
>
> void put_something(something)
> {
> if (kref_put(&something->kref))
> do something which might dead lock;
> }
>
> is way easier to avoid bugs and review than
>
> void really_kill_something(struct kref *kref)
> {
> struct my_something *something = container_of(...);
>
> do something which might dead lock;
> }
>
> void put_something(something)
> {
> kref_put(&someting->kref);
> }
>
> This is *way* worse than atomic_t not better and the confusion is
> caused exactly by superflous abstraction which leads the users of the
> API to imagine some non-existing function of the abstraction and
> hinders the flow of review.

I'm not immediately convinced by this, though maybe I haven't seen enough
examples yet.

The deadlocks that I have come across would not have been any more obvious in
either of the above - they were caused because sysfs_remove deadlocks when
called from inside a sysfs attribute action...

Also, while this re-write is possible for kref uses it isn't really possible
in kobject as the 'final_put' function must be included in the ktype (though
maybe you don't like that either).

What would be really helpful here is a survey of what sorts of things are
actually done in final_put functions so would could create some guidelines
about how to write the release functions.

Thanks,
NeilBrown


>
> >> I believe (or at least hope) the actual problem cases are mostly fixed
> >> now but there still are a lot of misconceptions around how stuff built
> >> on kref/kobject is synchronized and they sometimes lead to race
> >> conditions buried deep under several layers of abstractions and it
> >> becomes very hard to see those race conditions when they are buried
> >> deep.
> >
> > I agree that there probably misconceptions about how kref works and they are
> > probably based on a lack of appreciation of the subtle differences in
> > flavours of refcounts. Hence my desire to create and document different
> > k*ref types which clarify the different use cases.
>
> Oh, I'm not objecting to cleaning up how reference counts are done
> per-se but *PLEASE* refrain from introducing abstractions for
> abstraction's sake. The k* stuff, device model and sysfs already
> walked down that road and got burned badly.
>
> > BTW I'd be perfectly happy if the first patch was taken and
> > subsequent ones not. I think they are a good idea, but I'm happy to
> > forgo them (for now:-).
>
> If it can be done in a way that it doesn't substitute pure logical
> complexity with one which involves memory ordering issues, which is
> almost always way worse, I have no objection at all.
>
> Thanks.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/