Re: [PATCH 2/3] livepatch/rcu: Warn when system consistency is broken in RCU code
From: Petr Mladek
Date: Thu May 11 2017 - 08:44:29 EST
On Mon 2017-05-08 11:51:08, Josh Poimboeuf wrote:
> On Thu, May 04, 2017 at 12:55:15PM +0200, Petr Mladek wrote:
> > RCU is not watching inside some RCU code. As a result, the livepatch
> > ftrace handler might see ops->func_stack and some other flags in
> > a wrong state. Then a livepatch might make the system unstable.
> >
> > diff --git a/kernel/livepatch/patch.c b/kernel/livepatch/patch.c
> > index 4c4fbe409008..ffdf5fa8005b 100644
> > --- a/kernel/livepatch/patch.c
> > +++ b/kernel/livepatch/patch.c
> > @@ -62,6 +62,14 @@ static void notrace klp_ftrace_handler(unsigned long ip,
> > /* RCU may not be watching, make it see us. */
> > rcu_irq_enter_irqson();
> >
> > + /*
> > + * RCU still might not see us if we patch a function inside
> > + * the RCU infrastructure. Then we might see wrong state of
> > + * func->stack and other flags.
> > + */
> > + if (unlikely(!rcu_is_watching()))
> > + WARN_ONCE(1, "Livepatch modified a function that can not be handled a safe way.!");
> > +
> > rcu_read_lock();
> >
> > func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,
>
> Also I wonder if we can constrain the warning somehow. I think the
> warning only applies if a patch is in progress, right?
I think that this was not addressed in the other mails.
I wanted to add some constrains but all my attempts have failed
so far. For example, I wanted to avoid the warning when the function
was patched with the immediate flag set. We would be on the safe side.
Such a patch could not be removed. And the consistency model is week
enough.
The problem is that the transaction might finish too early when a
patched function cannot be watched by RCU. It means that the finished
transaction does not mean that we are safe. Also a finished transaction
allows to start a new one. Then the ftrace handler might see an
outdated function stack. Therefore it is not sure if the previously
used patch was immediate ("safely" handled) ...
Fortunately, this situation is a real corner case. It might actually be
good that the problem is always reported. It helps to detect it during
testing and avoid sending such a patch to users.
> In that case, if RCU is broken, would it be feasible to
> mutex_trylock() the klp mutex to try to ensure that no patches are
> being applied while the ftrace handler is running? Then it wouldn't
> matter if RCU were broken because the func stack wouldn't be
> changing anyway. Then it could only warn if it failed to get the
> mutex.
If we could not guarantee that a function might be patched a safe way,
we should not allow the patching in the first place.
The problem here is that this situation is detected at runtime.
We should do our best to avoid a damage if this happens. The 3rd patch
is from this category. But we should make it as easy as possible
to catch potential problems during patch creation or testing.
Best Regards,
Petr