Re: [RFC PATCH 6/9] livepatch: create per-task consistency model

From: Josh Poimboeuf
Date: Tue Feb 10 2015 - 11:56:50 EST

Next message: Linus Torvalds: "Re: [GIT] Networking"
Previous message: Sowmini Varadhan: "Re: [PATCH] rds: rds_cong_queue_updates needs to defer the congestion update transmission"
In reply to: Miroslav Benes: "Re: [RFC PATCH 6/9] livepatch: create per-task consistency model"
Next in thread: Miroslav Benes: "Re: [RFC PATCH 6/9] livepatch: create per-task consistency model"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Feb 10, 2015 at 04:59:17PM +0100, Miroslav Benes wrote:
>
> On Mon, 9 Feb 2015, Josh Poimboeuf wrote:
>
> > Add a basic per-task consistency model. This is the foundation which
> > will eventually enable us to patch those ~10% of security patches which
> > change function prototypes and/or data semantics.
> >
> > When a patch is enabled, livepatch enters into a transition state where
> > tasks are converging from the old universe to the new universe. If a
> > given task isn't using any of the patched functions, it's switched to
> > the new universe. Once all the tasks have been converged to the new
> > universe, patching is complete.
> >
> > The same sequence occurs when a patch is disabled, except the tasks
> > converge from the new universe to the old universe.
> >
> > The /sys/kernel/livepatch/<patch>/transition file shows whether a patch
> > is in transition. Only a single patch (the topmost patch on the stack)
> > can be in transition at a given time. A patch can remain in the
> > transition state indefinitely, if any of the tasks are stuck in the
> > previous universe.
> >
> > A transition can be reversed and effectively canceled by writing the
> > opposite value to the /sys/kernel/livepatch/<patch>/enabled file while
> > the transition is in progress. Then all the tasks will attempt to
> > converge back to the original universe.
>
> Hi Josh,
>
> first, thanks a lot for great work. I'm starting to go through it and it's
> gonna take me some time to do and send a complete review.

I know there are a lot of details to look at, please take your time. I
really appreciate your review. (And everybody else's, for that matter
:-)

> > + /* success! unpatch obsolete functions and do some cleanup */
> > +
> > + if (klp_universe_goal == KLP_UNIVERSE_OLD) {
> > + klp_unpatch_objects(klp_transition_patch);
> > +
> > + /* prevent ftrace handler from reading old func->transition */
> > + synchronize_rcu();
> > + }
> > +
> > + pr_notice("'%s': %s complete\n", klp_transition_patch->mod->name,
> > + klp_universe_goal == KLP_UNIVERSE_NEW ? "patching" :
> > + "unpatching");
> > +
> > + klp_complete_transition();
> > +}
>
> ...synchronize_rcu() could be insufficient. There still can be some
> process in our ftrace handler after the call.
>
> Consider the following scenario:
>
> When synchronize_rcu is called some process could have been preempted on
> some other cpu somewhere at the start of the ftrace handler before
> rcu_read_lock. synchronize_rcu waits for the grace period to pass, but that
> does not mean anything for our process in the handler, because it is not
> in rcu critical section. There is no guarantee that after synchronize_rcu
> the process would be away from the handler.
>
> "Meanwhile" klp_try_complete_transition continues and calls
> klp_complete_transition. This clears func->transition flags. Now the
> process in the handler could be scheduled again. It reads the wrong value
> of func->transition and redirection to the wrong function is done.
>
> What do you think? I hope I made myself clear.

You really made me think. But I don't think there's a race here.

Consider the two separate cases, patching and unpatching:

1. patching has completed: klp_universe_goal and all tasks'
klp_universes are at KLP_UNIVERSE_NEW. In this case, the value of
func->transition doesn't matter, because we want to use the func at
the top of the stack, and if klp_universe is NEW, the ftrace handler
will do that, regardless of the value of func->transition. This is
why I didn't do the rcu_synchronize() in this case. But maybe you're
not worried about this case anyway, I just described it for the sake
of completeness :-)

2. unpatching has completed: klp_universe_goal and all tasks'
klp_universes are at KLP_UNIVERSE_OLD. In this case, the value of
func->transition _does_ matter. However, notice that
klp_unpatch_objects() is called before rcu_synchronize(). That
removes the "new" func from the klp_ops stack. Since the ftrace
handler accesses the list _after_ calling rcu_read_lock(), it will
never see the "new" func, and thus func->transition will never be
set.

That said, I think there is a race where the WARN_ON_ONCE(!func)
could trigger here, and it wouldn't be an error. So I think I'll
remove the warning.

Does that make sense?

> There is the similar problem for dynamic trampolines in ftrace. You
> cannot remove them unless there is no process in the handler. I think
> rcu-tasks were merged a while ago for this purpose. However ftrace
> does not use them yet and I don't know if we could exploit them to
> solve this issue. I need to think more about it.

Ok, sounds like that's an ftrace bug that could affect us.

--
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Linus Torvalds: "Re: [GIT] Networking"
Previous message: Sowmini Varadhan: "Re: [PATCH] rds: rds_cong_queue_updates needs to defer the congestion update transmission"
In reply to: Miroslav Benes: "Re: [RFC PATCH 6/9] livepatch: create per-task consistency model"
Next in thread: Miroslav Benes: "Re: [RFC PATCH 6/9] livepatch: create per-task consistency model"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]