Re: [RFC PATCH] livepatch: allow removal of a disabled patch

From: Miroslav Benes
Date: Thu May 05 2016 - 04:28:22 EST

Next message: Matthias Brugger: "Re: [alsa-devel] [PATCH 1/7] ASoC: mediatek: Refine mt8173 driver and change config option"
Previous message: Borislav Petkov: "Re: Is BIT() in arch/x86/include/uapi/asm/kvm.h defined? Where?"
In reply to: Josh Poimboeuf: "Re: [RFC PATCH] livepatch: allow removal of a disabled patch"
Next in thread: Josh Poimboeuf: "Re: [RFC PATCH] livepatch: allow removal of a disabled patch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 4 May 2016, Josh Poimboeuf wrote:

> On Wed, May 04, 2016 at 04:35:28PM +0200, Miroslav Benes wrote:
> > On Wed, 4 May 2016, Josh Poimboeuf wrote:
> >
> > > On Wed, May 04, 2016 at 01:58:47PM +0200, Miroslav Benes wrote:
> > > > On Tue, 3 May 2016, Josh Poimboeuf wrote:
> > > >
> > > > > On Tue, May 03, 2016 at 09:39:48PM -0500, Josh Poimboeuf wrote:
> > > > > > On Wed, May 04, 2016 at 12:31:12AM +0200, Jiri Kosina wrote:
> > > > > > > On Tue, 3 May 2016, Josh Poimboeuf wrote:
> > > > > > >
> > > > > > > > > 1. Do we really need a completion? If I am not missing something
> > > > > > > > > kobject_del() always waits for sysfs callers to leave thanks to kernfs
> > > > > > > > > active protection.
> > > > > > > >
> > > > > > > > What do you mean by "kernfs active protection"? I see that
> > > > > > > > kernfs_remove() gets the kernfs_mutex lock, but I can't find anywhere
> > > > > > > > that a write to a sysfs file uses that lock.
> > > > > > > >
> > > > > > > > I'm probably missing something...
> > > > > > >
> > > > > > > I don't want to speak on Miroslav's behalf, but I'm pretty sure that what
> > > > > > > he has on mind is per-kernfs_node active refcounting kernfs does (see
> > > > > > > kernfs_node->active, and especially it's usage in __kernfs_remove()).
> > > > > > >
> > > > > > > More specifically, execution of store() and show() sysfs callbacks is
> > > > > > > guaranteed (by kernfs) to happen with that particular attribute's active
> > > > > > > reference held for reading (and that makes it impossible for that
> > > > > > > attribute to vanish prematurely).
> > > > > >
> > > > > > Thanks, that makes sense.
> > > > > >
> > > > > > So what exactly is the problem the completion is trying to solve? Is it
> > > > > > to ensure that the kobject has been cleaned up before it returns to the
> > > > > > caller, in case the user wants to call klp_register() again after
> > > > > > unregistering?
> > > > > >
> > > > > > If so, that's quite an unusual use case which I think we should just
> > > > > > consider unsupported. In fact, if you try to do it, kobject_init()
> > > > > > complains loudly because kobj->state_initialized is still 1 because
> > > > > > kobjects aren't meant to be reused like that.
> > > > >
> > > > > ... and now I realize the point is actually to prevent the caller from
> > > > > freeing klp_patch before kobject_cleanup() runs.
> > > >
> > > > Exactly. Sorry I was so brief.
> > > >
> > > > > So yeah, it looks like we need the completion in case
> > > > > CONFIG_DEBUG_KOBJECT_RELEASE is enabled.
> > > > >
> > > > > Or alternatively we could convert patch->kobj to be dynamically
> > > > > allocated instead of embedded in klp_patch.
> > > >
> > > > But that wouldn't help, would it? Dynamic kobjects registers generic
> > > > release function dynamic_kobj_release() and that's it. We're in the same
> > > > situation. I have got a feeling that dynamic kobjects are only for trivial
> > > > cases.
> > >
> > > But the patch release doesn't need to do anything, right?
> >
> > That is correct. I wanted to point out that dynamic_kobj_release() did not
> > really solve our "completion" issue. If there is a problem in our code and
> > we need completion, dynamic kobjects would not help. If we don't need a
> > completion at all we could move to dynamic kobjects.
>
> Why would we need a completion if we had a dynamic kobject? The
> lifetime of the kobject no longer matters if it's separated from the
> klp_patch struct. The user can go ahead and free the klp_patch before
> the kobject gets freed, no problem.

See below.

> > There is still container_of() though.
>
> Yes, but we could still find the corresponding klp_patch by searching
> through the klp_patches list. It's not ideal, but IMO it would be
> better than needing the completion to account for our "special"
> situation.

That would be a solution.

> > > > Moreover we use container_of() several times in the code and that does not
> > > > work with dynamically allocated kobjects.
> > > >
> > > > Anyway I am really confused now. When I read changelog of c817a67ecba7
> > > > ("kobject: delayed kobject release: help find buggy drivers") all makes
> > > > perfect sense. But isn't our situation somewhat special, because we have
> > > > refcounts completely under control? So we know that once we call
> > > > kobject_put() we can let a patch go... I must be missing something.
> > > >
> > > > It does not make sense to introduce completion just to satisfy a feature
> > > > which was introduced to debug general cases.
> > >
> > > I think our situation is "special" because klp_patch and its embedded
> > > kobject have separate lifetimes. We have a kobject, which we own, which
> > > is embedded in a klp_patch struct, which we don't own.
> > >
> > > If I understand correctly, normally when you release a kobject, the
> > > containing struct gets freed. But we can't do that here because the
> > > caller allocated the klp_patch.
> >
> > Normally only struct kobject is freed, no?
> >
> > From Documentation/kobject.txt:
> >
> > "One important point cannot be overstated: every kobject must have a
> > release() method, and the kobject must persist (in a consistent state)
> > until that method is called. If these constraints are not met, the code is
> > flawed."
> >
> > So we need to only make sure that klp_patch does not disappear before
> > calling release() method. In our case that cannot happen even without
> > completion, because we call kobject_put() in klp_unregister_patch() when
> > the patch module is going and kobject_put() calls our release (potentially
> > empty) method in a "sync" way. If I read that code correctly.
> >
> > This does not hold only if CONFIG_DEBUG_KOBJECT_RELEASE=y.
>
> Hm, maybe I'm missing your point, or maybe you're missing mine. Maybe
> both :-)
>
> I understand why your patch has a completion. And I understand that
> it's needed because of CONFIG_DEBUG_KOBJECT_RELEASE. But even then I
> think it's only needed *if* the kobject is embedded in the klp_patch
> struct.
>
> In my experience, usually the point of having a kobject release function
> is so you can kfree the kobject's containing struct. That doesn't
> conflict with your quote from kobject.txt.
>
> But in our "special" code, klp_patch (and thus the kobject itself) is
> allocated by the caller. But its embedded kobject is initialized by
> livepatch code. In my experience, it's highly unusual to have the
> kobject allocated by one party and initialized by another.
>
> That's why I think it would be much more logical to turn patch->obj into
> a pointer. Then we can allocate it *and* initialize it, so that we can
> 100% control its lifetime and not have to worry about the low-level
> details of how kobject_del() works, completions,
> CONFIG_DEBUG_KOBJECT_RELEASE, etc. And we would be more in line with
> the principles of kobject, as I understand them ;-)

I think it boils down to the following problem.

1. CONFIG_DEBUG_KOBJECT_RELEASE=y

2. we have dynamic kobjects, so there is a pointer in klp_patch to struct
kobject

3. it is allocated during klp_init_patch() and all is fine

4. now we want to remove the patch module. It is disabled and module_put()
is called. User calls rmmod on the module.

5. klp_unregister_patch() is called in __exit method.

6. klp_free_patch() is called.

7. kobject_put(patch->kobj) is called.

...now it gets interesting...

8. among others kobject_cleanup() is scheduled as a delayed work (this is
important).

9. there is no completion, so kobject_put returns and the module goes
away.

10. someone calls patch enabled_store attribute (for example). They can
because kobject_cleanup() has not been called yet. It is delayed
scheduled.

...crash...

Is this scenario possible? I think it is. And I think it is possible
regardless dynamic/static allocation of the kobject in klp_patch (static
allocation scenario was my concern at the beginning of this thread).

I think it is not possible when CONFIG_DEBUG_KOBJECT_RELEASE=n. Again
regardless of dynamic/static situation.

So if it is possible we need the completion just because of
CONFIG_DEBUG_KOBJECT_RELEASE. Which is unfortunate.

Does all this make sense now?

Miroslav

Next message: Matthias Brugger: "Re: [alsa-devel] [PATCH 1/7] ASoC: mediatek: Refine mt8173 driver and change config option"
Previous message: Borislav Petkov: "Re: Is BIT() in arch/x86/include/uapi/asm/kvm.h defined? Where?"
In reply to: Josh Poimboeuf: "Re: [RFC PATCH] livepatch: allow removal of a disabled patch"
Next in thread: Josh Poimboeuf: "Re: [RFC PATCH] livepatch: allow removal of a disabled patch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]