Re: [RFC PATCH v1.9 12/14] livepatch: create per-task consistency model

From: Miroslav Benes
Date: Thu Apr 14 2016 - 05:25:30 EST


On Fri, 25 Mar 2016, Josh Poimboeuf wrote:

> Add a basic per-task consistency model. This is the foundation which
> will eventually enable us to patch those ~10% of security patches which
> change function prototypes and/or data semantics.
>
> When a patch is enabled, livepatch enters into a transition state where
> tasks are converging from the old universe to the new universe. If a
> given task isn't using any of the patched functions, it's switched to
> the new universe. Once all the tasks have been converged to the new
> universe, patching is complete.
>
> The same sequence occurs when a patch is disabled, except the tasks
> converge from the new universe to the old universe.
>
> The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
> is in transition. Only a single patch (the topmost patch on the stack)
> can be in transition at a given time. A patch can remain in the
> transition state indefinitely, if any of the tasks are stuck in the
> previous universe.
>
> A transition can be reversed and effectively canceled by writing the
> opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
> the transition is in progress. Then all the tasks will attempt to
> converge back to the original universe.

So after spending some time with this patch I must say hats off to you. I
haven't managed to find anything serious yet and I doubt I would. Few
things below.

> +/*
> + * klp_update_task_universe() - change the patched state of a task
> + * @task: The task to change
> + *
> + * Converts the patched state of the task so that it will switch to the set of
> + * functions in the goal universe.
> + */
> +static inline void klp_update_task_universe(struct task_struct *task)
> +{
> + /*
> + * The corresponding write barriers are in klp_init_transition() and
> + * klp_start_transition(). See the comments there for an explanation.
> + */
> + smp_rmb();
> +
> + task->klp_universe = klp_universe_goal;
> +}

I wonder whether we should introduce a static_key for this function.
klp_update_task_universe() is called in some important code paths and it
is called everytime if CONFIG_LIVEPATCH is on. It does not matter much for
syscall paths, because we enter slowpath there only if TIF_KLP_NEED_UPDATE
is set and that is set iff patching is in progress. But we call the
function also elsewhere. So maybe it would be nice to introduce static_key
in the function which would be on if the patching is in progress and off
if not.

Maybe it is just an overoptimization though.

[...]

> +static bool klp_try_switch_task(struct task_struct *task)
> +{
> + struct rq *rq;
> + unsigned long flags;
> + int ret;
> + bool success = false;
> +
> + /* check if this task has already switched over */
> + if (task->klp_universe == klp_universe_goal)
> + return true;

[...]

> +/*
> + * This function can be called in the middle of an existing transition to
> + * reverse the direction of the universe goal. This can be done to effectively
> + * cancel an existing enable or disable operation if there are any tasks which
> + * are stuck in the starting universe.
> + */
> +void klp_reverse_transition(void)
> +{
> + struct klp_patch *patch = klp_transition_patch;
> +
> + klp_start_transition(!klp_universe_goal);
> + klp_try_complete_transition();
> +
> + patch->enabled = !patch->enabled;
> +}

This function could be called iff the patching is in progress, there are
some not-yet-migrated tasks and we wait for scheduled workqueue to run
klp_try_complete_transition() again, right? I have two questions.

1. Is the call to klp_try_complete_transition() here necessary? It would
be called via workqueue, wouldn't it? I suspect we don't gain much with
this and we introduce possible future problems because of "double
success" of klp_try_complete_transition() (nothing serious there as of
now though). But I might be wrong because I got really confused by this
piece of code and its context :)

2. klp_reverse_transition() works well because not-yet-migrated tasks are
still in KLP_UNIVERSE_OLD (without loss of generality) and since
klp_universe_goal is swapped they are in fact in their target universe.
klp_try_switch_task() returns immediately in such case. The rest is
migrated in an ordinary way.

What about TIF_KLP_NEED_UPDATE (I know it is introduced in later patch but
it is easier to discuss it here)? klp_start_transition() resets the flag
for all the task. Shouldn't the flag be cleared for the task in
klp_try_switch_task() if task->klp_universe == klp_universe_goal? In fact
it does not matter if it is set. The patching success is determined by
klp_func->klp_universe booleans. But those tasks would needlessly go
through syscall slowpaths (see note above about static_key) which could
hurt performance.

Does this make sense?

[...]

transition.h:
>
> +extern void klp_init_transition(struct klp_patch *patch, int universe);
> +extern void klp_start_transition(int universe);
> +extern void klp_reverse_transition(void);
> +extern bool klp_try_complete_transition(void);
> +extern void klp_complete_transition(void);

externs are superfluous here and we do not have them in other header
files.

Anyway, great job!

Miroslav