Re: [PATCH 3/3] livepatch: force transition process to finish

From: Josh Poimboeuf
Date: Tue May 30 2017 - 08:41:37 EST


On Mon, May 29, 2017 at 02:28:13PM +0200, Petr Mladek wrote:
> On Fri 2017-05-26 12:37:56, Josh Poimboeuf wrote:
> > On Thu, May 25, 2017 at 06:03:07PM +0200, Petr Mladek wrote:
> > > On Thu 2017-05-25 14:59:55, Miroslav Benes wrote:
> > > >
> > > > > > > In fact, I would suggest to take klp_mutex in force_store()
> > > > > > > and do all actions synchronously, including the check
> > > > > > > of klp_transition_patch.
> > > > > >
> > > > > > I still think it is better not do it. klp_unmark_tasks() does nothing else
> > > > > > than tasks already do. They call klp_update_patch_state() by themselves
> > > > > > and they do not grab klp_mutex lock for doing that. klp_unmark_tasks()
> > > > > > only forces this action.
> > > > >
> > > > > You have a point. But I am not convinced ;-) klp_update_patch_state()
> > > > > was called very carefully only when it was safe. The forcing
> > > > > intentionally breaks the consistency model. User should really know
> > > > > what they are doing when they use this feature.
> > > > >
> > > > > I think that we should actually taint the kernel. Developers should
> > > > > know when users were pulling their legs.
> > > >
> > > > We could do that. I can change pr_warn() to WARN_ON_ONCE(), which would of
> > > > course taint the kernel.
> > >
> > > Sounds good to me.
> >
> > I'm thinking that WARN_ON_ONCE() seems too severe. If the patch didn't
> > need a consistency model in the first place then it wouldn't be worth
> > warning about.
> >
> > We have to trust that the user knows what they're doing. And that's
> > true for the entire live patching process, including patch analysis and
> > patch creation. And anyway we already have a taint flag for that:
> > TAINT_LIVEPATCH.
>
> But the force is done on the user side. Let's say that the authors of
> the livepatch code and of the patches know what they are doing.
> Could we expect the same from the admins that apply the patches?
>
> TAINT_LIVEPATCH is set because the system behaves differently
> than with the original code. But it still should be consistent.
> Using the force migration might move the system to a wonder land.

True. If the patch creators don't want the user to "use the force", the
WARNING would be appropriate. Otherwise, if the patch creators *do*
want the user to force, the WARNING will be overkill and may alarm the
user.

When the user is root, we always have to trust them to a certain extent.
So I'd be more worried about dealing with the fallout of the false
WARNING. But I don't feel strongly about it either way.

> > > > > > On the other hand, I do not see a problem in doing that. We already have a
> > > > > > relationship between klp_mutex and tasklist_lock defined elsewhere, so it
> > > > > > is safe.
> > > > >
> > > > > Yup.
> > > > >
> > > > > > It would only serialize things needlessly.
> > > > >
> > > > > I do not agree. The speed is not important here. Also look
> > > > > into klp_reverse_transition(). We explicitly clear all
> > > > > TIF_PATCH_PENDING flags and call synchronize_rcu() just
> > > > > to make the situation easier and reduce space for potential
> > > > > mistakes.
> > > >
> > > > Yes, because we had to do that. We ran into problems otherwise. We do not
> > > > have to do it here. It does not help anything in my opinion.
> > >
> > > AFAIK, we did not have to do it, see
> > > https://lkml.kernel.org/r/20161222143452.GK25166@xxxxxxxxxxxxxxx
> > > and the comment starting with "It would still leave a small".
> > >
> > > Just for record, the idea of disabling the TIF flags came from Josh
> > > in another mail. I have just repeated it.
> > >
> > > I think that the problem already is complex enough and the
> > > serialization would reduce the space of potential races.
> > > But it is possible that I see it just too complex here.
> >
> > IMO we can skip the mutex. The consistency model will be broken anyway,
> > so all bets are off.
>
> I just hope that I will never be forced to debug a system crash
> after this operation.
>
> Imagine a situation when we send a livepatch using the hybrid
> consistency model that should be safe also in the immediate mode.
> Some processes would get stacked. We suggest forcing because
> it should be safe. And it will break. Then we will want to know
> why this has happened. If the forcing is not serialized, we will
> need to consider/check much more parallel operations.
>
> But if I am the only one who think this way, it might mean
> that I am over-pessimistic in this context. I will buy
> some head bandage to be prepared and could live without
> the serialization.

I don't feel strongly about this one either way either. But we all
agree that it doesn't need the mutex now, and I can't think of a
scenario where the code would change such that it would need it.

--
Josh