Re: [PATCH 3/3] livepatch: force transition process to finish

From: Petr Mladek
Date: Mon May 29 2017 - 08:28:20 EST


On Fri 2017-05-26 12:37:56, Josh Poimboeuf wrote:
> On Thu, May 25, 2017 at 06:03:07PM +0200, Petr Mladek wrote:
> > On Thu 2017-05-25 14:59:55, Miroslav Benes wrote:
> > >
> > > > > > In fact, I would suggest to take klp_mutex in force_store()
> > > > > > and do all actions synchronously, including the check
> > > > > > of klp_transition_patch.
> > > > >
> > > > > I still think it is better not do it. klp_unmark_tasks() does nothing else
> > > > > than tasks already do. They call klp_update_patch_state() by themselves
> > > > > and they do not grab klp_mutex lock for doing that. klp_unmark_tasks()
> > > > > only forces this action.
> > > >
> > > > You have a point. But I am not convinced ;-) klp_update_patch_state()
> > > > was called very carefully only when it was safe. The forcing
> > > > intentionally breaks the consistency model. User should really know
> > > > what they are doing when they use this feature.
> > > >
> > > > I think that we should actually taint the kernel. Developers should
> > > > know when users were pulling their legs.
> > >
> > > We could do that. I can change pr_warn() to WARN_ON_ONCE(), which would of
> > > course taint the kernel.
> >
> > Sounds good to me.
>
> I'm thinking that WARN_ON_ONCE() seems too severe. If the patch didn't
> need a consistency model in the first place then it wouldn't be worth
> warning about.
>
> We have to trust that the user knows what they're doing. And that's
> true for the entire live patching process, including patch analysis and
> patch creation. And anyway we already have a taint flag for that:
> TAINT_LIVEPATCH.

But the force is done on the user side. Let's say that the authors of
the livepatch code and of the patches know what they are doing.
Could we expect the same from the admins that apply the patches?

TAINT_LIVEPATCH is set because the system behaves differently
than with the original code. But it still should be consistent.
Using the force migration might move the system to a wonder land.


> > > > > On the other hand, I do not see a problem in doing that. We already have a
> > > > > relationship between klp_mutex and tasklist_lock defined elsewhere, so it
> > > > > is safe.
> > > >
> > > > Yup.
> > > >
> > > > > It would only serialize things needlessly.
> > > >
> > > > I do not agree. The speed is not important here. Also look
> > > > into klp_reverse_transition(). We explicitly clear all
> > > > TIF_PATCH_PENDING flags and call synchronize_rcu() just
> > > > to make the situation easier and reduce space for potential
> > > > mistakes.
> > >
> > > Yes, because we had to do that. We ran into problems otherwise. We do not
> > > have to do it here. It does not help anything in my opinion.
> >
> > AFAIK, we did not have to do it, see
> > https://lkml.kernel.org/r/20161222143452.GK25166@xxxxxxxxxxxxxxx
> > and the comment starting with "It would still leave a small".
> >
> > Just for record, the idea of disabling the TIF flags came from Josh
> > in another mail. I have just repeated it.
> >
> > I think that the problem already is complex enough and the
> > serialization would reduce the space of potential races.
> > But it is possible that I see it just too complex here.
>
> IMO we can skip the mutex. The consistency model will be broken anyway,
> so all bets are off.

I just hope that I will never be forced to debug a system crash
after this operation.

Imagine a situation when we send a livepatch using the hybrid
consistency model that should be safe also in the immediate mode.
Some processes would get stacked. We suggest forcing because
it should be safe. And it will break. Then we will want to know
why this has happened. If the forcing is not serialized, we will
need to consider/check much more parallel operations.

But if I am the only one who think this way, it might mean
that I am over-pessimistic in this context. I will buy
some head bandage to be prepared and could live without
the serialization.

Best Regards,
Petr