Re: [PATCH 2/3] livepatch/rcu: Warn when system consistency is broken in RCU code

From: Josh Poimboeuf
Date: Wed May 10 2017 - 13:58:40 EST


On Wed, May 10, 2017 at 06:04:23PM +0200, Petr Mladek wrote:
> On Tue 2017-05-09 11:18:35, Josh Poimboeuf wrote:
> > if (in_nmi())
> > rcu_nmi_enter(); /* in case we're called before nmi_enter() */
>
> This does not work as expected. in_nmi() is implemented as
>
> (preempt_count() & NMI_MASK)
>
> These bits are set in nmi_enter(), see
>
> preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);
>
> Note that nmi_enter() calls rcu_nmi_enter() right after
> setting the preempt_count bit.
>
> It means that if in_nmi() returns true, we should already
> on the safe side regarding using rcu_read_lock()/unlock().

Ah, you're right. I was worried about the gap between the start of
do_nmi() and when it calls rcu_nmi_enter(), but it seems all the
functions it calls in that gap are 'notrace', so they couldn't be
patched anyway. And as you pointed out, in_nmi() wouldn't work.

> The patch was designed to use basically the same solution
> as is used in the stack tracer. It is using
> rcu_read_lock()/unlock() as we do.
>
> The stack tracer is different in the following ways:
>
> + It takes a spin lock. This is why it has to give
> up in NMI completely.
>
> + It disables interrupts. I guess that it is because
> of the spin lock as well. Otherwise, it would not
> be safe in IRQ context.
>
> + It checks whether local_irq_save() has a chance to
> work and gives up if it does not.
>
>
> On the other hand, the live patch handler:
>
> + does not need any lock => could be used in NMI
>
> + does not need to disable interrupts because
> it does not use any lock
>
> + checks if local_irq_save() actually succeeded.
> It seems more reliable to me.
>
>
> I am not sure if we all understand the problem.

No kidding :-)

> IMHO, the point is that RCU must be aware when we call
> rcu_read_lock()/unlock().
>
> My understanding is that rcu_irq_enter() tries to make RCU watching
> when it was not. Then rcu_is_watching() reports if we are on
> the safe side.
>
> But it is possible that I miss something. One question is if
> rcu_irq_enter()/exit() calls can be nested.
>
> I still need to think about it.

The code looks ok to me now, except for a few minor issues:

- The warning message should be more specific.

- The documentation should probably mention the name of the specific RCU
function which shouldn't be patched.

- The documentation might also mention that the warning could also be
triggered in early NMI or exception code, e.g. if there are any calls
to functions with fentry calls which have been patched.

- The code comment should probably refer to the documentation, otherwise
nobody will ever read it ;-)

--
Josh