Re: livepatch: allow removal of a disabled patch

From: Jessica Yu
Date: Thu May 05 2016 - 20:42:32 EST

Next message: Andy Lutomirski: "Re: [PATCH 5/6] intel_sgx: driver documentation"
Previous message: Toshi Kani: "[PATCH v3 1/5] block: Add vfs_msg() interface"
In reply to: Jiri Kosina: "Re: [RFC PATCH] livepatch: allow removal of a disabled patch"
Next in thread: Miroslav Benes: "Re: livepatch: allow removal of a disabled patch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

+++ Josh Poimboeuf [05/05/16 10:04 -0500]:

On Thu, May 05, 2016 at 04:25:48PM +0200, Miroslav Benes wrote:

On Thu, 5 May 2016, Josh Poimboeuf wrote:

> On Thu, May 05, 2016 at 10:28:12AM +0200, Miroslav Benes wrote:
> > I think it boils down to the following problem.
> >
> > 1. CONFIG_DEBUG_KOBJECT_RELEASE=y
> >
> > 2. we have dynamic kobjects, so there is a pointer in klp_patch to struct
> > kobject
> >
> > 3. it is allocated during klp_init_patch() and all is fine
> >
> > 4. now we want to remove the patch module. It is disabled and module_put()
> > is called. User calls rmmod on the module.
> >
> > 5. klp_unregister_patch() is called in __exit method.
> >
> > 6. klp_free_patch() is called.
> >
> > 7. kobject_put(patch->kobj) is called.
> >
> > ...now it gets interesting...
> >
> > 8. among others kobject_cleanup() is scheduled as a delayed work (this is
> > important).
> >
> > 9. there is no completion, so kobject_put returns and the module goes
> > away.
> >
> > 10. someone calls patch enabled_store attribute (for example). They can
> > because kobject_cleanup() has not been called yet. It is delayed
> > scheduled.
> >
> > ...crash...
>
> But what exactly causes the crash? In enabled_store() we can see that
> the patch isn't in the list, so we can return -EINVAL.

Ok, bad example. Take enabled_show() instead. It could be fixed in the
same way, but I am not sure it is the right thing to do. It does not scale
because the problem is elsewhere.

Anyway, it is (even if theoretically) there in my opinion and we
have two options.

1. We could forget about CONFIG_DEBUG_KOBJECT_RELEASE and all is ok
without completion and regardless of dynamic/static kobject allocation.

2. We introduce completion and we are ok even with
CONFIG_DEBUG_KOBJECT_RELEASE=y and again regardless of dynamic/static
kobject allocation.

I would disagree with the statement that the dynamic kobject doesn't
scale. We would just need a helper function to get from a kobject to
its klp_patch.

In fact, to me it seems like the right way to do it. It doesn't make
sense for the code which creates the kobject to be different from the
code which initializes it. It's slightly out of context, but
kobject.txt does say:

"Code which creates a kobject must, of course, initialize that object."

I view the completion as a hack to compensate for the fact that we're
abusing the kobject interface. And so it makes sense to me that
CONFIG_DEBUG_KOBJECT_RELEASE would cause problems, because we're using
kobjects in the wrong way.

So in my view, the two options are:

1. Convert the kobject to dynamic as I described.

2. Change the klp_register() interface so that klp_patch gets allocated
in livepatch code.

I'd be curious to hear what others think.

So, I think both of these solutions would enable us to get rid of
the completion. Let me try to summarize..

For solution #1, if we dynamically allocate the kobject, i.e. we have a
pointer now instead of having it embedded in the klp_patch struct, we no
longer need to worry if the corresponding klp_patch gets deallocated under
our nose. Since the kobject_cleanup function is delayed w/
CONFIG_DEBUG_KOBJECT_RELEASE, it is possible to have sysfs entries that
refer to a klp_patch that no longer exists. Thus if any of the sysfs
functions get called, we would have to take care to ensure that the
klp_patch struct corresponding to the kobject in question actually still
exists. In this case, all sysfs functions would require an extra check to
make sure the matching klp_patch is still on the patches list and return an
error if it isn't found. The "pro" is that this change would be simple, the
"con" is that now kobjects are decoupled and managed completely separately
from the object (klp_patch) with which they are associated, which doesn't
feel 100% right.

For solution #2, we could have livepatch manage the (de)allocation of
klp_patch objects internally. Maybe in this scenario the caller would need
to request a klp_patch object be allocated and the caller would fill out the
returned klp_patch struct as appropriate. In this case, we would be able
leave the kobject embedded in the klp_patch struct (and dynamic kobjects
wouldn't be needed), as livepatch would now have control of both structures.

Then during the patch module exit path, when kobject_put is called, the
klp_patch struct would be freed in its kobject's release function. We
wouldn't have to hold up rmmod, and delayed execution of kobject_cleanup
wouldn't break anything, because a klp_patch would then have the same "lifespan"
as its corresponding kobject, and therefore it would be safe to invoke
enabled_store & co. up until kobject_cleanup is finally executed. We'd be
able to use container_of in this case as well. In addition, we wouldn't
have to force all sysfs functions to support an awkward edge-case (i.e.
checking if the corresponding klp_patch still exists). I think this
solution also matches better with the typical use-case of the kobject
release method, as described in kobject.txt (replacing 'my_object' with
klp_patch):
---
(...)
This notification is done through a kobject's release() method.
Usually such a method has a form like:

void my_object_release(struct kobject *kobj)
{
struct my_object *mine = container_of(kobj, struct my_object, kobj);
/* Perform any additional cleanup on this object, then... */
kfree(mine);
}
---
Apologies for the giant wall of text. In any case I feel like solution #2
is actually closer in line with how kobjects are normally used, embedded in
the structures they refer to, which get deallocated once their refcount
hits 0. What do people think?

Jessica

Next message: Andy Lutomirski: "Re: [PATCH 5/6] intel_sgx: driver documentation"
Previous message: Toshi Kani: "[PATCH v3 1/5] block: Add vfs_msg() interface"
In reply to: Jiri Kosina: "Re: [RFC PATCH] livepatch: allow removal of a disabled patch"
Next in thread: Miroslav Benes: "Re: livepatch: allow removal of a disabled patch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]