Re: [RFC PATCH v1.9 08/14] livepatch: separate enabled and patched states

From: Josh Poimboeuf
Date: Tue Apr 12 2016 - 14:25:25 EST


On Tue, Apr 12, 2016 at 12:35:49PM -0500, Chris J Arges wrote:
> On Tue, Apr 12, 2016 at 12:16:00PM -0500, Josh Poimboeuf wrote:
> > On Tue, Apr 12, 2016 at 09:44:43AM -0500, Chris J Arges wrote:
> > > On Fri, Mar 25, 2016 at 02:34:55PM -0500, Josh Poimboeuf wrote:
> > > > Once we have a consistency model, patches and their objects will be
> > > > enabled and disabled at different times. For example, when a patch is
> > > > disabled, its loaded objects' funcs can remain registered with ftrace
> > > > indefinitely until the unpatching operation is complete and they're no
> > > > longer in use.
> > > >
> > > > It's less confusing if we give them different names: patches can be
> > > > enabled or disabled; objects (and their funcs) can be patched or
> > > > unpatched:
> > > >
> > > > - Enabled means that a patch is logically enabled (but not necessarily
> > > > fully applied).
> > > >
> > > > - Patched means that an object's funcs are registered with ftrace and
> > > > added to the klp_ops func stack.
> > > >
> > > > Also, since these states are binary, represent them with booleans
> > > > instead of ints.
> > > >
> > >
> > > Josh,
> > >
> > > Awesome work here first of all!
> > >
> > > Looking through the patchset a bit I see the following bools:
> > > - functions: patched, transitioning
> > > - objects: patched
> > > - patches: enabled
> > >
> > > It seems this reflects the following states at a patch level:
> > > disabled - module inserted, not yet logically enabled
> > > enabled - logically enabled, but not all objects/functions patched
> > > transitioning - objects/functions are being applied or reverted
> > > patched - all objects/functions patched
> > >
> > > However each object and function could have the same state and the parent object
> > > just reflects the 'aggregate state'. For example if all funcs in an object are
> > > patched then the object is also patched.
> > >
> > > Perhaps we need more states (or maybe there will be more in the future), but
> > > wouldn't this just be easier to have something like for each patch, object, and
> > > function?
> > >
> > > enum klp_state{
> > > KLP_DISABLED,
> > > KLP_ENABLED,
> > > KLP_TRANSITION,
> > > KLP_PATCHED,
> > > }
> > >
> > >
> > > I'm happy to help out too.
> >
> > Thanks for the comments. First let me try to explain why I chose two
> > bools rather than a single state variable.
> >
> > At a func level, it's always in one of the following states:
> >
> > patched=0 transition=0: unpatched
> > patched=0 transition=1: unpatched, temporary starting state
> > patched=1 transition=1: patched, may be visible to some tasks
> > patched=1 transition=0: patched, visible to all tasks
> >
> > And for unpatching, it goes in the reverse order:
> >
> > patched=1 transition=0: patched, visible to all tasks
> > patched=1 transition=1: patched, may be visible to some tasks
> > patched=0 transition=1: unpatched, temporary ending state
> > patched=0 transition=0: unpatched
> >
> > (note to self, put the above in a comment somewhere)
> >
>
> This is helpful and makes more sense.
>
> > Now, we could convert the states from two bools into a single enum. But
> > I think it would complicate the code. Because nowhere in the code does
> > it need to access the full state. In some places it accesses
> > func->patched and in other places it accesses func->transition, but it
> > never needs to access both.
> >
> > So for example, the following check in klp_ftrace_handler():
> >
> > if (unlikely(func->transition)) {
> >
> > would change to:
> >
> > if (unlikely(func->state == KLP_ENABLED || func->state == KLP_TRANSITION)) {
> >
> > Sure, we could use a helper function to make that more readable. But
> > with the bools its clearer and you don't need a helper function.
> >
> > As another example, see the following code in klp_complete_transition():
> >
> > klp_for_each_object(klp_transition_patch, obj)
> > klp_for_each_func(obj, func)
> > func->transition = false;
> >
> > The last line would have to be changed to something like:
> >
> > if (patching...)
> > func->state = KLP_PATCHED;
> > else /* unpatching */
> > func->state = KLP_DISABLED;
> >
> > So that's why I picked two bools over a single state variable: it seems
> > to make the code simpler.
> >
> > As to the other idea about copying the func states to the object and
> > patch level, I get the feeling that would also complicate the code. We
> > patch and transition at a function granularity, so the "real" state is
> > at the func level. Proliferating that state to objects and patches
> > might be tricky to get right, and it could make it harder to understand
> > exactly what's going on in the code. And I don't really see a benefit
> > to doing that.
> >
> > --
> > Josh
> >
>
> I think keeping it simple makes a ton of sense, thanks for the explanation.
>
> I'm also envisioning how to troubleshoot patches that don't converge.
>
> So if someone wanted to check if a function has been fully patched on all tasks
> they would check:
>
> /sys/kernel/livepatch/<patch>/transition
> /sys/kernel/livepatch/<patch>/patched
>
> And see if patched=1 and transition=0 things are applied.
>
> However if patched=1 and transition=1 then if a user wanted to dig down and see
> which pid's we were waiting on we could do:
>
> cat /proc/*/patch_status
>
> and check if any pid's patch_status values still contain KLP_UNIVERSE_OLD.
>
> If we wanted to see which function in the task needs patching we could:
> cat /proc/<pid>/stack
> and see if anything in that stack contains the functions in question.

Yeah. It's not ideal that the user has to check multiple places to see
the "state". But IMO that's not a big deal, and a user space tool
should be able to make it more user friendly.

> Anyway I'll keep looking at this patchset, patching using the
> samples/livepatch code works for me without issue so far.

Ok, thanks!

--
Josh