Re: [PATCH v10 06/10] livepatch: Add atomic replace

From: Petr Mladek
Date: Thu Mar 22 2018 - 11:43:26 EST


On Tue 2018-03-20 16:26:19, Josh Poimboeuf wrote:
> On Tue, Mar 20, 2018 at 03:35:01PM +0100, Petr Mladek wrote:
> > On Tue 2018-03-13 17:48:04, Josh Poimboeuf wrote:
> > > On Wed, Mar 07, 2018 at 09:20:35AM +0100, Petr Mladek wrote:
> > > > This patch adds a new "replace" flag to struct klp_patch. When it is
> > > > enabled, a set of 'nop' klp_func will be dynamically created for all
> > > > functions that are already being patched but that will no longer be
> > > > modified by the new patch. They are temporarily used as a new target
> > > > during the patch transition.
> > > >
> > > > diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> > > > index fd0296859ff4..ad508a86b2f9 100644
> > > > --- a/kernel/livepatch/core.c
> > > > +++ b/kernel/livepatch/core.c
> > > > +static int klp_add_nops(struct klp_patch *patch)
> > > > +{
> > > > + struct klp_patch *old_patch;
> > > > + struct klp_object *old_obj;
> > > > + int err = 0;
> > > > +
> > > > + if (WARN_ON(!patch->replace))
> > > > + return -EINVAL;
> > >
> > > IMO, this is another one of those overly paranoid warnings that isn't
> > > really needed. Why would we call klp_add_nops() for a non-replace
> > > patch?
> >
> > Just to be sure. What is the difference, for example, against the following
> > checks in __klp_enable_patch() from your point of view, please?
> >
> > if (klp_transition_patch)
> > return -EBUSY;
> >
> > if (WARN_ON(patch->enabled))
> > return -EINVAL;
> >
> > One difference is that klp_enable_patch() is exported symbol. One the
> > other hand, livepatch code developers could do mistakes as well.
> > Adding nops sounds like an innoncent operation after all ;-)
>
> But klp_enable_patch() being an exported symbol is an important
> difference. It catches a patch author abusing the interface. Which is
> much more likely than one of us accidentally calling klp_add_nops().
> Have you not noticed how thorough our code reviews are? ;-)
>
> Anyway, I suppose it's a harmless check and I don't feel very strongly
> about it, it just seems unnecessary.

I have removed the check.


> > > > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> > > > index 6917100fbe79..d6af190865d2 100644
> > > > --- a/kernel/livepatch/transition.c
> > > > +++ b/kernel/livepatch/transition.c
> > > > @@ -87,6 +87,36 @@ static void klp_complete_transition(void)
> > > > klp_transition_patch->mod->name,
> > > > klp_target_state == KLP_PATCHED ? "patching" : "unpatching");
> > > >
> > > > + /*
> > > > + * For replace patches, we disable all previous patches, and replace
> > > > + * the dynamic no-op functions by removing the ftrace hook.
> > > > + */
> > > > + if (klp_transition_patch->replace && klp_target_state == KLP_PATCHED) {
> > > > + /*
> > > > + * Make sure that no ftrace handler accesses any older patch
> > > > + * on the stack. This might happen when the user forced the
> > > > + * transaction while some running tasks were still falling
> > > > + * back to the old code. There might even still be ftrace
> > > > + * handlers that have not seen the last patch on the stack yet.
> > > > + *
> > > > + * It probably is not necessary because of the rcu-safe access.
> > > > + * But better be safe than sorry.
> > > > + */
> > > > + if (klp_forced)
> > > > + klp_synchronize_transition();
> > >
> > > I don't like this. Hopefully we can get just rid of it, if we also get
> > > rid of the concept of "throwing away" patches like I proposed.
> >
> > What exactly you do not like about it, please?
> >
> > It is not needed if all processes were migrated using the consistency
> > model, definitely.
> >
> > If the transition has been forced then the barrier should be needed from
> > similar reasons as the barrier after klp_unpatch_objects() below.
> > We basically want to be sure what ftrace handlers see on the stack.
> >
> > Will it help, when I remove the last paragraph where the formulation
> > is quite uncertain?
>
> Well, the last paragraph doesn't inspire a lot of confidence ;-) It
> sounds like voodoo.

Races are never easy area. You are right that I was not completely
confident and wanted to be on the safe side. Your questions helped
me to realize that the synchronization is not neeeded.


> Also the comment just seems very confusing to me:
>
> - What specifically is it protecting against, e.g., _why_ should no
> ftrace handler access any old patch on the stack, and when shouldn't
> it do so? Is the barrier needed before func->transition is cleared,
> or what?

I was afraid of invalid memory accessed in klp_ftrace_handler().
I simply underestimated the power of RCU. I am not that familiar
with it. Also the following is a bit non-standard:

func = list_entry_rcu(func->stack_node.next,
struct klp_func, stack_node);

Anyway, you made me to check it. All looks safe after all.

> - Does RCU make it safe, or doesn't it? If yes, why is this needed? If
> no, why not?

Yes. I removed the synchronization. Instead, I explained the situation
in a comment above klp_discard_replaced_patches().



> > > > +
> > > > + klp_throw_away_replaced_patches(klp_transition_patch,
> > > > + klp_forced);
> > > > +
> > > > + /*
> > > > + * There is no need to synchronize the transition after removing
> > > > + * nops. They must be the last on the func_stack. Ftrace
> > > > + * gurantees that nobody will stay in the trampoline after
> > >
> > > "guarantees"
> > >
> > > > + * the ftrace handler is unregistered.
> > > > + */
> > > > + klp_unpatch_objects(klp_transition_patch, KLP_FUNC_NOP);
> > > > + }
> > > > +
> > > > if (klp_target_state == KLP_UNPATCHED) {
> > > > /*
> > > > * All tasks have transitioned to KLP_UNPATCHED so we can now
> > > > @@ -143,6 +173,15 @@ static void klp_complete_transition(void)
> > > > if (!klp_forced && klp_target_state == KLP_UNPATCHED)
> > > > module_put(klp_transition_patch->mod);
> > > >
> > > > + /*
> > > > + * We do not need to wait until the objects are really freed.
> > > > + * The patch must be on the bottom of the stack. Therefore it
> > > > + * will never replace anything else. The only important thing
> > > > + * is that we wait when the patch is being unregistered.
> > > > + */
> > > > + if (klp_transition_patch->replace && klp_target_state == KLP_PATCHED)
> > > > + klp_free_objects(klp_transition_patch, KLP_FUNC_NOP);
> > > > +
> > >
> > > This makes me a bit nervous. What happens if the patch is enabled, then
> > > disabled, then enabled again? Then klp_free_objects() wouldn't do
> > > anything, because the ops would already be freed.
> >
> > They are not necessary when all replaced patches are removed from
> > the stack. There will be no livepatch if this one gets disabled.
>
> My point was that if you enable, then disable, then enable again,
> klp_free_objects() will get called again, and it will do nothing the
> second time around.

> Maybe that's safe in this instance, but in general, it's easy to forget
> the re-enable case when adding special cases for 'patch->replace'.

Yes, it is safe.


> I get the feeling that it would be safer to just clear 'patch->replace'
> after this step to avoid such scenarios. After all, when re-enabling a
> 'replace' patch, it's no longer replacing anything (assuming here that a
> replace patch will permanently disable all previous patches).

I am not completely comfortable with touching item that is set by the
author of the patch. On the other hand, I do not see how it could
harm. I have just added the line to disable the flag.

Best Regards,
Petr