Re: [PATCH 2/3] livepatch: add atomic replace

From: Petr Mladek
Date: Fri Aug 25 2017 - 05:26:31 EST


On Wed 2017-07-19 13:18:06, Jason Baron wrote:
> Introduce atomic replace, by first creating a 'no_op' klp_func for all
> the functions that are reverted by patch B.

> diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h
> index 5038337..6fd7222 100644
> --- a/include/linux/livepatch.h
> +++ b/include/linux/livepatch.h
> @@ -132,6 +134,7 @@ struct klp_object {
> * @kobj: kobject for sysfs resources
> * @obj_list: head of list for dynamically allocated struct klp_object
> * @enabled: the patch is enabled (but operation may be incomplete)
> + * @replaced: the patch has been replaced an can not be re-enabled

Is there any particular reason why the removed patch could
not longer be enabled? The user could get around this by
removing the replaced patch (rmmod) and applying it
again (insmod) but...


> * @finish: for waiting till it is safe to remove the patch module
> */
> struct klp_patch {
> @@ -201,8 +205,8 @@ static inline struct klp_func *func_iter_next(struct func_iter *iter)
> struct klp_func *func;
> struct klp_func_no_op *func_no_op;
>
> - if (iter->func->old_name || iter->func->new_func ||
> - iter->func->old_sympos) {
> + if (iter->func && (iter->func->old_name || iter->func->new_func ||
> + iter->func->old_sympos)) {

I guess that this check should already be in the previous patch.


> func = iter->func;
> iter->func++;
> } else {
> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> index e63f478..bf353da 100644
> --- a/kernel/livepatch/core.c
> +++ b/kernel/livepatch/core.c
> @@ -602,6 +605,118 @@ static void klp_free_patch(struct klp_patch *patch)
> +
> +static int klp_init_patch_no_ops(struct klp_patch *patch)
> +{
> + struct klp_object *obj, *prev_obj, *new_obj;
> + struct klp_func *prev_func, *func;
> + struct klp_func_no_op *new;
> + struct klp_patch *prev_patch;
> + struct obj_iter o_iter, prev_o_iter;
> + struct func_iter prev_f_iter, f_iter;
> + bool found, mod;
> +
> + if (patch->list.prev == &klp_patches)
> + return 0;
> +
> + prev_patch = list_prev_entry(patch, list);
> + klp_for_each_object(prev_patch, prev_obj, &prev_o_iter) {
> + if (!klp_is_object_loaded(prev_obj))
> + continue;

We must create the no_op object/function structures also for
no-yet-loaded modules. The transaction might take a long time.
The module might be loaded in the meantime. It must be able
to enable the stack of patches completely, including
the no_op ones, see klp_module_coming().

Note that the module must be patched even temporarily
because some (non-migrated) processes might still use
the semantic required by the to-be-removed patches.

This is actually one big advantage of the dynamically
allocated no_op functions. The infrastructure for
the coming/going modules should just work. The
same is true for klp_reverse_transition().


> +
> + klp_for_each_func(prev_obj, prev_func, &prev_f_iter) {
> + found = false;
> + klp_for_each_object(patch, obj, &o_iter) {
> + klp_for_each_func(obj, func, &f_iter) {
> + if ((strcmp(prev_func->old_name,
> + func->old_name) == 0) &&
> + (prev_func->old_sympos ==
> + func->old_sympos)) {
> + found = true;
> + break;
> + }
> + }
> + if (found)
> + break;
> + }
> + if (found)
> + continue;
> +
> + new = kmalloc(sizeof(*new), GFP_KERNEL);
> + if (!new)
> + return -ENOMEM;
> + new->orig_func = *prev_func;
> + new->orig_func.old_name = prev_func->old_name;
> + new->orig_func.new_func = NULL;

This is OK if the operation replaces all older patches. Otherwise,
you would need to store here the address from the patch down the stack.


> + new->orig_func.old_sympos = prev_func->old_sympos;
> + new->orig_func.immediate = prev_func->immediate;
> + new->orig_func.old_addr = prev_func->old_addr;

Hmm, this should be

new->orig_func.old_addr = prev_func->new_func;

klp_check_stack_func() should check the address of the old function
that is currently running. It is the variant of the function that
is on top of stack.

I think that we actually have bug in the current livepatch code
because old_addr always points to the original function!!!
I am going to look at it.


> + INIT_LIST_HEAD(&new->orig_func.stack_node);
> + new->orig_func.old_size = prev_func->old_size;

Same here. It should be the size of the currently used
function implementation (from the last active patch).


> + new->orig_func.new_size = 0;
> + new->orig_func.no_op = true;
> + new->orig_func.patched = false;
> + new->orig_func.transition = false;
> + found = false;
> + mod = klp_is_module(prev_obj);
> + klp_for_each_object(patch, obj, &o_iter) {
> + if (mod) {
> + if (klp_is_module(obj) &&
> + strcmp(prev_obj->name,
> + obj->name) == 0) {
> + found = true;
> + break;
> + }
> + } else if (!klp_is_module(obj)) {
> + found = true;
> + break;
> + }
> + }
> + if (found) {
> + list_add(&new->func_entry, &obj->func_list);
> + } else {
> + new_obj = kmalloc(sizeof(*new_obj), GFP_KERNEL);
> + if (!new_obj)
> + return -ENOMEM;
> + new_obj->name = prev_obj->name;
> + new_obj->funcs = NULL;
> + new_obj->mod = prev_obj->mod;

The new temporary object should be connected with the new patch.
I mean with patch->mod.

Another question is what to do with the kobjects in the dynamically
allocated klp_object and klp_func. I do not have strong opinion here.

One thing is that the kobject hierarchy allows to find *patch from
*obj and *obj from *func pointers. I guess that we do not rely on
this in the current livepatch code. But it is a nice trick that
we use in Kgraft.

Also userspace might use the related /sys/ hierarchy to monitor
the livepatching situation. If we do not create the kobjects,
we will hide some information.

Finally, please try to split this function into more pieces.
I wonder if we could separate the code for each level
(patch, object, func).


> + new_obj->patched = false;
> + INIT_LIST_HEAD(&new_obj->func_list);
> + INIT_LIST_HEAD(&new_obj->obj_entry);
> + list_add(&new->func_entry, &new_obj->func_list);
> + list_add(&new_obj->obj_entry, &patch->obj_list);
> + }
> + }
> + }
> +
> + return 0;
> +}
> +
> static int klp_init_func(struct klp_object *obj, struct klp_func *func)
> {
> if (!func->old_name || !func->new_func)

[...]

> diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
> index 1cfdabc..cbb8b9d 100644
> --- a/kernel/livepatch/patch.c
> +++ b/kernel/livepatch/patch.c
> @@ -117,6 +117,8 @@ static void notrace klp_ftrace_handler(unsigned long ip,
> }
> }
>
> + if (func->no_op)
> + goto unlock;

JFYI, this means that we will use the original (unpatched) code.
It is perfectly fine if we new patch replaces all older ones.
It is wrong in the current concept where you want to replace
only the last patch on the stack. See above my comments about
orig_func.new_func.

To make it clear, I am all for replacing all patches on the stack.
IMHO, it is much more practical. The result is well defined.

If you remove only the last patch, the result depends on
the other patches on the stack which is harder to predict.
IMHO, replacing all patches is a better semantic.


> klp_arch_set_pc(regs, (unsigned long)func->new_func);
> unlock:
> preempt_enable_notrace();
> @@ -135,7 +137,7 @@ static unsigned long klp_get_ftrace_location(unsigned long faddr)
> }
> #endif
>
> -static void klp_unpatch_func(struct klp_func *func)
> +void klp_unpatch_func(struct klp_func *func, bool unregistered)
> {
> struct klp_ops *ops;
>
> @@ -155,9 +157,11 @@ static void klp_unpatch_func(struct klp_func *func)
> if (WARN_ON(!ftrace_loc))
> return;
>
> - WARN_ON(unregister_ftrace_function(&ops->fops));
> - WARN_ON(ftrace_set_filter_ip(&ops->fops, ftrace_loc, 1, 0));
> -
> + if (!unregistered) {
> + WARN_ON(unregister_ftrace_function(&ops->fops));
> + WARN_ON(ftrace_set_filter_ip(&ops->fops, ftrace_loc, 1,
> + 0));
> + }
> list_del_rcu(&func->stack_node);

IMHO, we should rather unpatch the replaced patches before
we remove the temporary no_op stuff. Then this should not
be needed, see below.


> list_del(&ops->node);
> kfree(ops);
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index e112826..43e1609 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -70,6 +72,7 @@ static void klp_synchronize_transition(void)
> schedule_on_each_cpu(klp_sync);
> }
>
> +
> /*
> * The transition to the target patch state is complete. Clean up the data
> * structures.
> @@ -81,8 +84,32 @@ static void klp_complete_transition(void)
> struct task_struct *g, *task;
> unsigned int cpu;
> bool immediate_func = false;
> + bool no_op = false;
> struct obj_iter o_iter;
> struct func_iter f_iter;
> + unsigned long ftrace_loc;
> + struct klp_ops *ops;
> + struct klp_patch *prev_patch;
> +
> + /* remove ftrace hook for all no_op functions. */
> + if (klp_target_state == KLP_PATCHED) {

IMHO, this is the right place to call:

klp_unpatch_objects(prev_patch);

It will just remove the function from the func_stack(s). All stacks
have at least two functions: the replaced one and one from the
new patch (real or no_op). It should keep the one from the new
patch on top of the stack.


> + klp_for_each_object(klp_transition_patch, obj, &o_iter) {
> + klp_for_each_func(obj, func, &f_iter) {
> + if (!func->no_op)
> + continue;
> +
> + ops = klp_find_ops(func->old_addr);
> + if (WARN_ON(!ops))
> + continue;
> + ftrace_loc = func->old_addr;
> + WARN_ON(unregister_ftrace_function(&ops->fops));
> + WARN_ON(ftrace_set_filter_ip(&ops->fops,
> + ftrace_loc,
> + 1, 0));
> + no_op = true;

I wonder if we could reuse klp_unpatch_object() and
klp_unpatch_function() here. We could add a no_op parameter
to unpatch only the no_op stuff.


> + }
> + }
> + }
>
> if (klp_target_state == KLP_UNPATCHED) {
> /*
> @@ -132,6 +161,24 @@ static void klp_complete_transition(void)
> }
>
> done:
> + /* remove and free any no_op functions */
> + if (no_op && klp_target_state == KLP_PATCHED) {
> + prev_patch = list_prev_entry(klp_transition_patch, list);
> + if (prev_patch->enabled) {
> + klp_unpatch_objects(prev_patch);
> + prev_patch->enabled = false;

Hmm, the patch is normally marked disabled before the transition
starts, see __klp_disable_patch().

We should do this in __klp_enable_patch(). It needs to be handled
also in klp_reverse_transition().


> + prev_patch->replaced = true;
> + module_put(prev_patch->mod);
> + }
> + klp_for_each_object(klp_transition_patch, obj, &o_iter) {
> + klp_for_each_func(obj, func, &f_iter) {
> + if (func->no_op)
> + klp_unpatch_func(func, true);
> + }
> + }

This should be done earlier. See above the hint about adding
no_op parameter to klp_unpatch_object() and klp_uptach_func()
and reusing their code.

My intention is that we should ideally do the same actions in
the same locations as we already do for normal patches.
I mean manipulating the various flags, ftrace handlers, and
other per-patch/object/func values. It should help to simply
review and even reuse some more code.


> + klp_patch_free_no_ops(klp_transition_patch);
> + }
> +
> klp_target_state = KLP_UNDEFINED;
> klp_transition_patch = NULL;
> }
> @@ -204,10 +251,18 @@ static int klp_check_stack_func(struct klp_func *func,
> if (klp_target_state == KLP_UNPATCHED) {
> /*
> * Check for the to-be-unpatched function
> - * (the func itself).
> + * (the func itself). If we're unpatching
> + * a no-op, then we're running the original
> + * function. We never 'patch' a no-op function,
> + * since we just remove the ftrace hook.
> */
> - func_addr = (unsigned long)func->new_func;
> - func_size = func->new_size;
> + if (func->no_op) {
> + func_addr = (unsigned long)func->old_addr;
> + func_size = func->old_size;
> + } else {
> + func_addr = (unsigned long)func->new_func;
> + func_size = func->new_size;

This looks wrong. It should not be needed if we set
new_func and new_addr correctly in klp_init_patch_no_ops().

Thanks a lot for working on this feature. You did great job.
It seems that it was functional. Most of my hints should just
help to better integrate it into the existing code.

Best Regards,
Petr