Re: [PATCH 18/19] rcu/context_tracking: Merge dynticks counter and context tracking states

From: Frederic Weisbecker
Date: Fri Mar 11 2022 - 11:35:36 EST


On Thu, Mar 10, 2022 at 12:32:22PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 02, 2022 at 04:48:09PM +0100, Frederic Weisbecker wrote:
> > Updating the context tracking state and the RCU dynticks counter
> > atomically in a single operation is a first step towards improving CPU
> > isolation. This makes the context tracking state updates fully ordered
> > and therefore allow for later enhancements such as postponing some work
> > while a task is running isolated in userspace until it ever comes back
> > to the kernel.
> >
> > The state field becomes divided in two parts:
> >
> > 1) Lower bits for context tracking state:
> >
> > CONTEXT_IDLE = 1,
> > CONTEXT_USER = 2,
> > CONTEXT_GUEST = 4,
>
> And the CONTEXT_DISABLED value of -1 works because you can have only
> one of the above three bits set at a time?
>
> Except that RCU needs this to unconditionally at least distinguish
> between kernel and idle, given the prevalence of CONFIG_NO_HZ_IDLE=y.
> So does the CONTEXT_DISABLED really happen anymore?
>
> A few more questions interspersed below.

The value of CONTEXT_DISABLED is never stored in the ct->state. It is just
returned as is when CONTEXT_TRACKING is disabled. So this shouldn't conflict
with RCU.

> > @@ -452,15 +453,16 @@ void noinstr __ct_user_exit(enum ctx_state state)
> > * Exit RCU idle mode while entering the kernel because it can
> > * run a RCU read side critical section anytime.
> > */
> > - rcu_eqs_exit(true);
> > + ct_kernel_enter(true, RCU_DYNTICKS_IDX - state);
> > if (state == CONTEXT_USER) {
> > instrumentation_begin();
> > vtime_user_exit(current);
> > trace_user_exit(0);
> > instrumentation_end();
> > }
> > + } else {
> > + atomic_sub(state, &ct->state);
>
> OK, atomic_sub() got my attention. What is going on here? ;-)

Right :-)

So that's when context tracking user is running but RCU doesn't
track user. This is for example when NO_HZ_FULL=n but VIRT_CPU_ACCOUNTING_GEN=y.

I might remove that standalone VIRT_CPU_ACCOUNTING_GEN=y one day but for now
it's there.

Anyway so in this case we only want to track KERNEL <-> USER from context
tracking POV, but we don't need the DYNTICKS_RCU_IDX part, hence the spared
ordering.

But it still needs to be atomic because NMIs may increase DYNTICKS_RCU_IDX on
the same field.


> > @@ -548,7 +550,7 @@ EXPORT_SYMBOL_GPL(context_tracking);
> > void ct_idle_enter(void)
> > {
> > lockdep_assert_irqs_disabled();
> > - rcu_eqs_enter(false);
> > + ct_kernel_exit(false, RCU_DYNTICKS_IDX + CONTEXT_IDLE);
> > }
> > EXPORT_SYMBOL_GPL(ct_idle_enter);
> >
> > @@ -566,7 +568,7 @@ void ct_idle_exit(void)
> > unsigned long flags;
> >
> > local_irq_save(flags);
> > - rcu_eqs_exit(false);
> > + ct_kernel_enter(false, RCU_DYNTICKS_IDX - CONTEXT_IDLE);
>
> Nice! This works because all transitions must be either from or
> to kernel context, correct?

Exactly. There is no such thing as IDLE -> USER -> GUEST, etc...
There has to be KERNEL in the middle of each. Because we never
call rcu_idle_enter() -> rcu_user_enter() for example. The has to be
rcu_idle_exit() in the middle.

(famous last words).

> > /* Return true if the specified CPU is currently idle from an RCU viewpoint. */
> > @@ -321,8 +321,7 @@ bool rcu_dynticks_zero_in_eqs(int cpu, int *vp)
> > int snap;
> >
> > // If not quiescent, force back to earlier extended quiescent state.
> > - snap = ct_dynticks_cpu(cpu) & ~0x1;
> > -
> > + snap = ct_dynticks_cpu(cpu) & ~RCU_DYNTICKS_IDX;
>
> Do we also need to get rid of the low-order bits? Or is that happening
> elsewhere? Or is there some reason that they can stick around?

Yep, ct_dynticks_cpu() clears the low order CONTEXT_* bits.

> > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> > index 9bf5cc79d5eb..1ac48c804006 100644
> > --- a/kernel/rcu/tree_stall.h
> > +++ b/kernel/rcu/tree_stall.h
> > @@ -459,7 +459,7 @@ static void print_cpu_stall_info(int cpu)
> > rdp->rcu_iw_pending ? (int)min(delta, 9UL) + '0' :
> > "!."[!delta],
> > ticks_value, ticks_title,
> > - rcu_dynticks_snap(cpu) & 0xfff,
> > + (rcu_dynticks_snap(cpu) >> RCU_DYNTICKS_SHIFT) & 0xfff ,
>
> Actually, the low-ordder several bits are useful when debugging, so
> could you please not shift them away? Maybe also go to 0xffff to allow
> for more bits taken?

Yeah that makes sense, I'll change that.

Thanks a lot for the reviews!