Re: [PATCH 3/3] livepatch: force transition process to finish

From: Petr Mladek
Date: Wed May 24 2017 - 11:09:56 EST

Next message: Tejun Heo: "Re: [PATCH] workqueue: Implement delayed_work_busy()"
Previous message: Arnd Bergmann: "Re: CPU_BIG_ENDIAN in generic code (was: Re: [PATCH v3 3/7] arch/sparc: Define config parameter CPU_BIG_ENDIAN)"
In reply to: Miroslav Benes: "Re: [PATCH 3/3] livepatch: force transition process to finish"
Next in thread: Miroslav Benes: "Re: [PATCH 3/3] livepatch: force transition process to finish"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed 2017-05-24 16:15:49, Miroslav Benes wrote:
> On Wed, 24 May 2017, Petr Mladek wrote:
>
> > On Thu 2017-05-18 14:00:43, Miroslav Benes wrote:
> > > If a task sleeps in a set of patched functions uninterruptibly, it could
> > > block the whole transition process indefinitely. Thus it may be useful
> > > to clear its TIF_PATCH_PENDING to allow the process to finish.
> > >
> > > Admin can do that now by writing 2 to force sysfs attribute in livepatch
> > > sysfs directory. TIF_PATCH_PENDING is then cleared for all tasks and the
> > > transition can finish successfully.
> > >
> > > Important note! Use wisely. Admin must be sure that it is safe to
> > > execute such action. This means that it must be checked that by doing so
> > > the consistency model guarantees are not violated.
> > >
> > > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> > > index bb61aaa196d3..d057a34510e6 100644
> > > --- a/kernel/livepatch/transition.c
> > > +++ b/kernel/livepatch/transition.c
> > > @@ -591,3 +591,19 @@ void klp_send_fake_signal(void)
> > > }
> > > read_unlock(&tasklist_lock);
> > > }
> > > +
> > > +/*
> > > + * Drop TIF_PATCH_PENDING of all tasks on admin's request. This forces an
> > > + * existing transition to finish.
> > > + */
> > > +void klp_unmark_tasks(void)
> > > +{
> > > + struct task_struct *g, *task;
> > > +
> > > + pr_warn("all tasks marked as migrated on admin's request\n");
> > > +
> > > + read_lock(&tasklist_lock);
> > > + for_each_process_thread(g, task)
> > > + klp_update_patch_state(task);
> > > + read_unlock(&tasklist_lock);
> >
> > This should get called under klp_mutex. The following race comes to my mind:
> >
> > CPU0: CPU1:
> >
> > klp_transition_work_fn()
> > klp_try_complete_transition()
> > for_each_process()
> > if (!klp_try_switch_task(task))
> >
> > # success
> >
> > klp_complete_transition()
> >
> > for_each_process()
> > task->patch_state = KLP_UNDEFINED;
> >
> >
> > klp_unmark_tasks()
> > for_each_process()
> > klp_update_patch_state()
> > task->patch_state =
> > klp_target_state;
> >
> > klp_target_state = KLP_UNDEFINED;
> >
> > => CPU1 might happily set an obsolete state and create a mess.
>
> This should not happen. klp_update_patch_state() use
> test_and_clear_tsk_thread_flag(task, TIF_PATCH_PENDING) and only if true,
> task->patch_state is set.
>
> And all TIF_PATCH_PENDING are cleared when you get
> klp_complete_transition().

You are right. I missed that klp_update_patch_state() checked
the TIF flag before setting the state.

> > In fact, I would suggest to take klp_mutex in force_store()
> > and do all actions synchronously, including the check
> > of klp_transition_patch.
>
> I still think it is better not do it. klp_unmark_tasks() does nothing else
> than tasks already do. They call klp_update_patch_state() by themselves
> and they do not grab klp_mutex lock for doing that. klp_unmark_tasks()
> only forces this action.

You have a point. But I am not convinced ;-) klp_update_patch_state()
was called very carefully only when it was safe. The forcing
intentionally breaks the consistency model. User should really know
what they are doing when they use this feature.

I think that we should actually taint the kernel. Developers should
know when users were pulling their legs.

> On the other hand, I do not see a problem in doing that. We already have a
> relationship between klp_mutex and tasklist_lock defined elsewhere, so it
> is safe.

Yup.

> It would only serialize things needlessly.

I do not agree. The speed is not important here. Also look
into klp_reverse_transition(). We explicitly clear all
TIF_PATCH_PENDING flags and call synchronize_rcu() just
to make the situation easier and reduce space for potential
mistakes.

Best Regards,
Petr

Next message: Tejun Heo: "Re: [PATCH] workqueue: Implement delayed_work_busy()"
Previous message: Arnd Bergmann: "Re: CPU_BIG_ENDIAN in generic code (was: Re: [PATCH v3 3/7] arch/sparc: Define config parameter CPU_BIG_ENDIAN)"
In reply to: Miroslav Benes: "Re: [PATCH 3/3] livepatch: force transition process to finish"
Next in thread: Miroslav Benes: "Re: [PATCH 3/3] livepatch: force transition process to finish"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]